History log of /linux-master/arch/x86/include/asm/thread_info.h
Revision Date Author Comments
# 65c9cc9e 05-Dec-2023 H. Peter Anvin (Intel) <hpa@zytor.com>

x86/fred: Reserve space for the FRED stack frame

When using FRED, reserve space at the top of the stack frame, just
like i386 does.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-17-xin3.li@intel.com


# af0a76e1 17-May-2023 Arnd Bergmann <arnd@arndb.de>

thread_info: move function declarations to linux/thread_info.h

There are a few __weak functions in kernel/fork.c, which architectures
can override. If there is no prototype, the compiler warns about them:

kernel/fork.c:164:13: error: no previous prototype for 'arch_release_task_struct' [-Werror=missing-prototypes]
kernel/fork.c:991:20: error: no previous prototype for 'arch_task_cache_init' [-Werror=missing-prototypes]
kernel/fork.c:1086:12: error: no previous prototype for 'arch_dup_task_struct' [-Werror=missing-prototypes]

There are already prototypes in a number of architecture specific headers
that have addressed those warnings before, but it's much better to have
these in a single place so the warning no longer shows up anywhere.

Link: https://lkml.kernel.org/r/20230517131102.934196-14-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# ce3ba2af 18-Nov-2022 Alexander Potapenko <glider@google.com>

x86: Suppress KMSAN reports in arch_within_stack_frames()

arch_within_stack_frames() performs stack walking and may confuse
KMSAN by stepping on stale shadow values. To prevent false positive
reports, disable KMSAN checks in this function.

This fixes KMSAN's interoperability with CONFIG_HARDENED_USERCOPY.

Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Eric Biggers <ebiggers@google.com>
Link: https://github.com/google/kmsan/issues/89
Link: https://lore.kernel.org/lkml/Y3b9AAEKp2Vr3e6O@sol.localdomain/
Link: https://lore.kernel.org/all/20221118172305.3321253-1-glider%40google.com


# ef79970d 10-Mar-2022 Tony Luck <tony.luck@intel.com>

x86/split-lock: Remove unused TIF_SLD bit

Changes to the "warn" mode of split lock handling mean that TIF_SLD is
never set.

Remove the bit, and the functions that use it.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220310204854.31752-3-tony.luck@intel.com


# 5443f98f 14-Sep-2021 Ard Biesheuvel <ardb@kernel.org>

x86: add CPU field to struct thread_info

The CPU field will be moved back into thread_info even when
THREAD_INFO_IN_TASK is enabled, so add it back to x86's definition of
struct thread_info.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Mark Rutland <mark.rutland@arm.com>


# b5f06f64 26-Apr-2021 Balbir Singh <sblbir@amazon.com>

x86/mm: Prepare for opt-in based L1D flush in switch_mm()

The goal of this is to allow tasks that want to protect sensitive
information, against e.g. the recently found snoop assisted data sampling
vulnerabilites, to flush their L1D on being switched out. This protects
their data from being snooped or leaked via side channels after the task
has context switched out.

This could also be used to wipe L1D when an untrusted task is switched in,
but that's not a really well defined scenario while the opt-in variant is
clearly defined.

The mechanism is default disabled and can be enabled on the kernel command
line.

Prepare for the actual prctl based opt-in:

1) Provide the necessary setup functionality similar to the other
mitigations and enable the static branch when the command line option
is set and the CPU provides support for hardware assisted L1D
flushing. Software based L1D flush is not supported because it's CPU
model specific and not really well defined.

This does not come with a sysfs file like the other mitigations
because it is not bound to any specific vulnerability.

Support has to be queried via the prctl(2) interface.

2) Add TIF_SPEC_L1D_FLUSH next to L1D_SPEC_IB so the two bits can be
mangled into the mm pointer in one go which allows to reuse the
existing mechanism in switch_mm() for the conditional IBPB speculation
barrier efficiently.

3) Add the L1D flush specific functionality which flushes L1D when the
outgoing task opted in.

Also check whether the incoming task has requested L1D flush and if so
validate that it is not accidentaly running on an SMT sibling as this
makes the whole excercise moot because SMT siblings share L1D which
opens tons of other attack vectors. If that happens schedule task work
which signals the incoming task on return to user/guest with SIGBUS as
this is part of the paranoid L1D flush contract.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210108121056.21940-1-sblbir@amazon.com


# 8aacd1ea 26-Apr-2021 Balbir Singh <sblbir@amazon.com>

x86/process: Make room for TIF_SPEC_L1D_FLUSH

The upcoming support for paranoid L1D flush in switch_mm() requires that
TIF_SPEC_IB and the new TIF_SPEC_L1D_FLUSH are two consecutive bits in
thread_info::flags.

Move TIF_SPEC_FORCE_UPDATE to a spare bit to make room for the new one.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210108121056.21940-1-sblbir@amazon.com


# 1591584e 25-Jan-2021 Lai Jiangshan <laijs@linux.alibaba.com>

x86/process/64: Move cpu_current_top_of_stack out of TSS

cpu_current_top_of_stack is currently stored in TSS.sp1. TSS is exposed
through the cpu_entry_area which is visible with user CR3 when PTI is
enabled and active.

This makes it a coveted fruit for attackers. An attacker can fetch the
kernel stack top from it and continue next steps of actions based on the
kernel stack.

But it is actualy not necessary to be stored in the TSS. It is only
accessed after the entry code switched to kernel CR3 and kernel GS_BASE
which means it can be in any regular percpu variable.

The reason why it is in TSS is historical (pre PTI) because TSS is also
used as scratch space in SYSCALL_64 and therefore cache hot.

A syscall also needs the per CPU variable current_task and eventually
__preempt_count, so placing cpu_current_top_of_stack next to them makes it
likely that they end up in the same cache line which should avoid
performance regressions. This is not enforced as the compiler is free to
place these variables, so these entry relevant variables should move into
a data structure to make this enforceable.

The seccomp_benchmark doesn't show any performance loss in the "getpid
native" test result. Actually, the result changes from 93ns before to 92ns
with this change when KPTI is disabled. The test is very stable and
although the test doesn't show a higher degree of precision it gives enough
confidence that moving cpu_current_top_of_stack does not cause a
regression.

[ tglx: Removed unneeded export. Massaged changelog ]

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210125173444.22696-2-jiangshanlai@gmail.com


# b2e9df85 01-Feb-2021 Oleg Nesterov <oleg@redhat.com>

x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART

Save the current_thread_info()->status of X86 in the new
restart_block->arch_data field so TS_COMPAT_RESTART can be removed again.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210201174716.GA17898@redhat.com


# 8c150ba2 01-Feb-2021 Oleg Nesterov <oleg@redhat.com>

x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall()

The comment in get_nr_restart_syscall() says:

* The problem is that we can get here when ptrace pokes
* syscall-like values into regs even if we're not in a syscall
* at all.

Yes, but if not in a syscall then the

status & (TS_COMPAT|TS_I386_REGS_POKED)

check below can't really help:

- TS_COMPAT can't be set

- TS_I386_REGS_POKED is only set if regs->orig_ax was changed by
32bit debugger; and even in this case get_nr_restart_syscall()
is only correct if the tracee is 32bit too.

Suppose that a 64bit debugger plays with a 32bit tracee and

* Tracee calls sleep(2) // TS_COMPAT is set
* User interrupts the tracee by CTRL-C after 1 sec and does
"(gdb) call func()"
* gdb saves the regs by PTRACE_GETREGS
* does PTRACE_SETREGS to set %rip='func' and %orig_rax=-1
* PTRACE_CONT // TS_COMPAT is cleared
* func() hits int3.
* Debugger catches SIGTRAP.
* Restore original regs by PTRACE_SETREGS.
* PTRACE_CONT

get_nr_restart_syscall() wrongly returns __NR_restart_syscall==219, the
tracee calls ia32_sys_call_table[219] == sys_madvise.

Add the sticky TS_COMPAT_RESTART flag which survives after return to user
mode. It's going to be removed in the next step again by storing the
information in the restart block. As a further cleanup it might be possible
to remove also TS_I386_REGS_POKED with that.

Test-case:

$ cvs -d :pserver:anoncvs:anoncvs@sourceware.org:/cvs/systemtap co ptrace-tests
$ gcc -o erestartsys-trap-debuggee ptrace-tests/tests/erestartsys-trap-debuggee.c --m32
$ gcc -o erestartsys-trap-debugger ptrace-tests/tests/erestartsys-trap-debugger.c -lutil
$ ./erestartsys-trap-debugger
Unexpected: retval 1, errno 22
erestartsys-trap-debugger: ptrace-tests/tests/erestartsys-trap-debugger.c:421

Fixes: 609c19a385c8 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code")
Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210201174709.GA17895@redhat.com


# 66c1b6d7 01-Feb-2021 Oleg Nesterov <oleg@redhat.com>

x86: Move TS_COMPAT back to asm/thread_info.h

Move TS_COMPAT back to asm/thread_info.h, close to TS_I386_REGS_POKED.

It was moved to asm/processor.h by b9d989c7218a ("x86/asm: Move the
thread_info::status field to thread_struct"), then later 37a8f7c38339
("x86/asm: Move 'status' from thread_struct to thread_info") moved the
'status' field back but TS_COMPAT was forgotten.

Preparatory patch to fix the COMPAT case for get_nr_restart_syscall()

Fixes: 609c19a385c8 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210201174649.GA17880@redhat.com


# 51af3f23 15-Nov-2020 Gabriel Krisman Bertazi <krisman@collabora.com>

x86: Reclaim unused x86 TI flags

Reclaim TI flags that were migrated to syscall_work flags.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20201116174206.2639648-11-krisman@collabora.com


# b4581a52 15-Nov-2020 Gabriel Krisman Bertazi <krisman@collabora.com>

x86: Expose syscall_work field in thread_info

This field will be used by SYSCALL_WORK flags, migrated from TI flags.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20201116174206.2639648-2-krisman@collabora.com


# c8d5ed67 26-Oct-2020 Jens Axboe <axboe@kernel.dk>

x86: Wire up TIF_NOTIFY_SIGNAL

The generic entry code has support for TIF_NOTIFY_SIGNAL already. Just
provide the TIF bit.

[ tglx: Adopted to other TIF changes in x86 ]

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201026203230.386348-4-axboe@kernel.dk


# 8d71d2bf 03-Oct-2020 Gabriel Krisman Bertazi <krisman@collabora.com>

x86: Reclaim TIF_IA32 and TIF_X32

Now that these flags are no longer used, reclaim those TIF bits.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201004032536.1229030-11-krisman@collabora.com


# 47058bb5 03-Sep-2020 Christoph Hellwig <hch@lst.de>

x86: remove address space overrides using set_fs()

Stop providing the possibility to override the address space using
set_fs() now that there is no need for that any more. To properly
handle the TASK_SIZE_MAX checking for 4 vs 5-level page tables on
x86 a new alternative is introduced, which just like the one in
entry_64.S has to use the hardcoded virtual address bits to escape
the fact that TASK_SIZE_MAX isn't actually a constant when 5-level
page tables are enabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 27d6b4d1 22-Jul-2020 Thomas Gleixner <tglx@linutronix.de>

x86/entry: Use generic syscall entry function

Replace the syscall entry work handling with the generic version. Provide
the necessary helper inlines to handle the real architecture specific
parts, e.g. ptrace.

Use a temporary define for idtentry_enter_user which will be cleaned up
seperately.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200722220520.376213694@linutronix.de


# 6650cdd9 26-Jan-2020 Peter Zijlstra (Intel) <peterz@infradead.org>

x86/split_lock: Enable split lock detection by kernel

A split-lock occurs when an atomic instruction operates on data that spans
two cache lines. In order to maintain atomicity the core takes a global bus
lock.

This is typically >1000 cycles slower than an atomic operation within a
cache line. It also disrupts performance on other cores (which must wait
for the bus lock to be released before their memory operations can
complete). For real-time systems this may mean missing deadlines. For other
systems it may just be very annoying.

Some CPUs have the capability to raise an #AC trap when a split lock is
attempted.

Provide a command line option to give the user choices on how to handle
this:

split_lock_detect=
off - not enabled (no traps for split locks)
warn - warn once when an application does a
split lock, but allow it to continue
running.
fatal - Send SIGBUS to applications that cause split lock

On systems that support split lock detection the default is "warn". Note
that if the kernel hits a split lock in any mode other than "off" it will
OOPs.

One implementation wrinkle is that the MSR to control the split lock
detection is per-core, not per thread. This might result in some short
lived races on HT systems in "warn" mode if Linux tries to enable on one
thread while disabling on the other. Race analysis by Sean Christopherson:

- Toggling of split-lock is only done in "warn" mode. Worst case
scenario of a race is that a misbehaving task will generate multiple
#AC exceptions on the same instruction. And this race will only occur
if both siblings are running tasks that generate split-lock #ACs, e.g.
a race where sibling threads are writing different values will only
occur if CPUx is disabling split-lock after an #AC and CPUy is
re-enabling split-lock after *its* previous task generated an #AC.
- Transitioning between off/warn/fatal modes at runtime isn't supported
and disabling is tracked per task, so hardware will always reach a steady
state that matches the configured mode. I.e. split-lock is guaranteed to
be enabled in hardware once all _TIF_SLD threads have been scheduled out.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20200126200535.GB30377@agluck-desk2.amr.corp.intel.com


# 68d87513 28-Jan-2020 Frederic Weisbecker <frederic@kernel.org>

x86: Remove TIF_NOHZ

Static keys have replaced TIF_NOHZ to optimize the calls to context
tracking. We can now safely remove that thread flag.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>


# 7c805795 23-Oct-2019 Thomas Gleixner <tglx@linutronix.de>

x86/entry: Remove _TIF_NOHZ from _TIF_WORK_SYSCALL_ENTRY

Evaluating _TIF_NOHZ to decide whether to use the slow syscall entry path
is not only pointless, it's actually counterproductive:

1) Context tracking code is invoked unconditionally before that flag is
evaluated.

2) If the flag is set the slow path is invoked for nothing due to #1

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>


# 2b10906f 19-Dec-2019 Brian Gerst <brgerst@gmail.com>

x86: Remove force_iret()

force_iret() was originally intended to prevent the return to user mode with
the SYSRET or SYSEXIT instructions, in cases where the register state could
have been changed to be incompatible with those instructions. The entry code
has been significantly reworked since then, and register state is validated
before SYSRET or SYSEXIT are used. force_iret() no longer serves its original
purpose and can be eliminated.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20191219115812.102620-1-brgerst@gmail.com


# 111e7b15 12-Nov-2019 Thomas Gleixner <tglx@linutronix.de>

x86/ioperm: Extend IOPL config to control ioperm() as well

If iopl() is disabled, then providing ioperm() does not make much sense.

Rename the config option and disable/enable both syscalls with it. Guard
the code with #ifdefs where appropriate.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 22fe5b04 11-Nov-2019 Thomas Gleixner <tglx@linutronix.de>

x86/ioperm: Move TSS bitmap update to exit to user work

There is no point to update the TSS bitmap for tasks which use I/O bitmaps
on every context switch. It's enough to update it right before exiting to
user space.

That reduces the context switch bitmap handling to invalidating the io
bitmap base offset in the TSS when the outgoing task has TIF_IO_BITMAP
set. The invaldiation is done on purpose when a task with an IO bitmap
switches out to prevent any possible leakage of an activated IO bitmap.

It also removes the requirement to update the tasks bitmap atomically in
ioperm().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 383c2525 03-Apr-2019 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

x86/entry: Add TIF_NEED_FPU_LOAD

Add TIF_NEED_FPU_LOAD. This flag is used for loading the FPU registers
before returning to userland. It must not be set on systems without a
FPU.

If this flag is cleared, the CPU's FPU registers hold the latest,
up-to-date content of the current task's (current()) FPU registers.
The in-memory copy (union fpregs_state) is not valid.

If this flag is set, then all of CPU's FPU registers may hold a random
value (except for PKRU) and it is required to load the content of the
FPU registers on return to userland.

Introduce it now as a preparatory change before adding the main feature.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aubrey Li <aubrey.li@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190403164156.19645-17-bigeasy@linutronix.de


# d23650e0 28-Nov-2018 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

x86/thread_info: Remove _TIF_ALLWORK_MASK

There is no user of _TIF_ALLWORK_MASK since commit

21d375b6b34ff ("x86/entry/64: Remove the SYSCALL64 fast path").

Remove the unused define _TIF_ALLWORK_MASK.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181128222035.2996-4-bigeasy@linutronix.de


# 6d991ba5 28-Nov-2018 Thomas Gleixner <tglx@linutronix.de>

x86/speculation: Prevent stale SPEC_CTRL msr content

The seccomp speculation control operates on all tasks of a process, but
only the current task of a process can update the MSR immediately. For the
other threads the update is deferred to the next context switch.

This creates the following situation with Process A and B:

Process A task 2 and Process B task 1 are pinned on CPU1. Process A task 2
does not have the speculation control TIF bit set. Process B task 1 has the
speculation control TIF bit set.

CPU0 CPU1
MSR bit is set
ProcB.T1 schedules out
ProcA.T2 schedules in
MSR bit is cleared
ProcA.T1
seccomp_update()
set TIF bit on ProcA.T2
ProcB.T1 schedules in
MSR is not updated <-- FAIL

This happens because the context switch code tries to avoid the MSR update
if the speculation control TIF bits of the incoming and the outgoing task
are the same. In the worst case ProcB.T1 and ProcA.T2 are the only tasks
scheduling back and forth on CPU1, which keeps the MSR stale forever.

In theory this could be remedied by IPIs, but chasing the remote task which
could be migrated is complex and full of races.

The straight forward solution is to avoid the asychronous update of the TIF
bit and defer it to the next context switch. The speculation control state
is stored in task_struct::atomic_flags by the prctl and seccomp updates
already.

Add a new TIF_SPEC_FORCE_UPDATE bit and set this after updating the
atomic_flags. Check the bit on context switch and force a synchronous
update of the speculation control if set. Use the same mechanism for
updating the current task.

Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1811272247140.1875@nanos.tec.linutronix.de


# 5635d999 25-Nov-2018 Thomas Gleixner <tglx@linutronix.de>

x86/speculation: Avoid __switch_to_xtra() calls

The TIF_SPEC_IB bit does not need to be evaluated in the decision to invoke
__switch_to_xtra() when:

- CONFIG_SMP is disabled

- The conditional STIPB mode is disabled

The TIF_SPEC_IB bit still controls IBPB in both cases so the TIF work mask
checks might invoke __switch_to_xtra() for nothing if TIF_SPEC_IB is the
only set bit in the work masks.

Optimize it out by masking the bit at compile time for CONFIG_SMP=n and at
run time when the static key controlling the conditional STIBP mode is
disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185005.374062201@linutronix.de


# 5bfbe3ad 25-Nov-2018 Tim Chen <tim.c.chen@linux.intel.com>

x86/speculation: Prepare for per task indirect branch speculation control

To avoid the overhead of STIBP always on, it's necessary to allow per task
control of STIBP.

Add a new task flag TIF_SPEC_IB and evaluate it during context switch if
SMT is active and flag evaluation is enabled by the speculation control
code. Add the conditional evaluation to x86_virt_spec_ctrl() as well so the
guest/host switch works properly.

This has no effect because TIF_SPEC_IB cannot be set yet and the static key
which controls evaluation is off. Preparatory patch for adding the control
code.

[ tglx: Simplify the context switch logic and make the TIF evaluation
depend on SMP=y and on the static key controlling the conditional
update. Rename it to TIF_SPEC_IB because it controls both STIBP and
IBPB ]

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185005.176917199@linutronix.de


# 8eb729b7 25-Nov-2018 Tim Chen <tim.c.chen@linux.intel.com>

x86/speculation: Update the TIF_SSBD comment

"Reduced Data Speculation" is an obsolete term. The correct new name is
"Speculative store bypass disable" - which is abbreviated into SSBD.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185003.593893901@linutronix.de


# 9f65fb29 09-May-2018 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

x86/bugs: Rename _RDS to _SSBD

Intel collateral will reference the SSB mitigation bit in IA32_SPEC_CTL[2]
as SSBD (Speculative Store Bypass Disable).

Hence changing it.

It is unclear yet what the MSR_IA32_ARCH_CAPABILITIES (0x10a) Bit(4) name
is going to be. Following the rename it would be SSBD_NO but that rolls out
to Speculative Store Bypass Disable No.

Also fixed the missing space in X86_FEATURE_AMD_SSBD.

[ tglx: Fixup x86_amd_rds_enable() and rds_tif_to_amd_ls_cfg() as well ]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 885f82bf 29-Apr-2018 Thomas Gleixner <tglx@linutronix.de>

x86/process: Allow runtime control of Speculative Store Bypass

The Speculative Store Bypass vulnerability can be mitigated with the
Reduced Data Speculation (RDS) feature. To allow finer grained control of
this eventually expensive mitigation a per task mitigation control is
required.

Add a new TIF_RDS flag and put it into the group of TIF flags which are
evaluated for mismatch in switch_to(). If these bits differ in the previous
and the next task, then the slow path function __switch_to_xtra() is
invoked. Implement the TIF_RDS dependent mitigation control in the slow
path.

If the prctl for controlling Speculative Store Bypass is disabled or no
task uses the prctl then there is no overhead in the switch_to() fast
path.

Update the KVM related speculation control functions to take TID_RDS into
account as well.

Based on a patch from Tim Chen. Completely rewritten.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>


# 37a8f7c3 28-Jan-2018 Andy Lutomirski <luto@kernel.org>

x86/asm: Move 'status' from thread_struct to thread_info

The TS_COMPAT bit is very hot and is accessed from code paths that mostly
also touch thread_info::flags. Move it into struct thread_info to improve
cache locality.

The only reason it was in thread_struct is that there was a brief period
during which arch-specific fields were not allowed in struct thread_info.

Linus suggested further changing:

ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);

to:

if (unlikely(ti->status & (TS_COMPAT|TS_I386_REGS_POKED)))
ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);

on the theory that frequently dirtying the cacheline even in pure 64-bit
code that never needs to modify status hurts performance. That could be a
reasonable followup patch, but I suspect it matters less on top of this
patch.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
Link: https://lkml.kernel.org/r/03148bcc1b217100e6e8ecf6a5468c45cf4304b6.1517164461.git.luto@kernel.org


# 0500871f 02-Jan-2018 David Howells <dhowells@redhat.com>

Construct init thread stack in the linker script rather than by union

Construct the init thread stack in the linker script rather than doing it
by means of a union so that ia64's init_task.c can be got rid of.

The following symbols are then made available from INIT_TASK_DATA() linker
script macro:

init_thread_union
init_stack

INIT_TASK_DATA() also expands the region to THREAD_SIZE to accommodate the
size of the init stack. init_thread_union is given its own section so that
it can be placed into the stack space in the right order. I'm assuming
that the ia64 ordering is correct and that the task_struct is first and the
thread_info second.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Will Deacon <will.deacon@arm.com> (arm64)
Tested-by: Palmer Dabbelt <palmer@sifive.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>


# c482feef 04-Dec-2017 Andy Lutomirski <luto@kernel.org>

x86/entry/64: Make cpu_entry_area.tss read-only

The TSS is a fairly juicy target for exploits, and, now that the TSS
is in the cpu_entry_area, it's no longer protected by kASLR. Make it
read-only on x86_64.

On x86_32, it can't be RO because it's written by the CPU during task
switches, and we use a task gate for double faults. I'd also be
nervous about errata if we tried to make it RO even on configurations
without double fault handling.

[ tglx: AMD confirmed that there is no problem on 64-bit with TSS RO. So
it's probably safe to assume that it's a non issue, though Intel
might have been creative in that area. Still waiting for
confirmation. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bpetkov@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.733700132@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 9aaefe7b 04-Dec-2017 Andy Lutomirski <luto@kernel.org>

x86/entry/64: Separate cpu_current_top_of_stack from TSS.sp0

On 64-bit kernels, we used to assume that TSS.sp0 was the current
top of stack. With the addition of an entry trampoline, this will
no longer be the case. Store the current top of stack in TSS.sp1,
which is otherwise unused but shares the same cacheline.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.050864668@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 196bd485 29-Sep-2017 Andrey Ryabinin <ryabinin.a.a@gmail.com>

x86/asm: Use register variable to get stack pointer value

Currently we use current_stack_pointer() function to get the value
of the stack pointer register. Since commit:

f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")

... we have a stack register variable declared. It can be used instead of
current_stack_pointer() function which allows to optimize away some
excessive "mov %rsp, %<dst>" instructions:

-mov %rsp,%rdx
-sub %rdx,%rax
-cmp $0x3fff,%rax
-ja ffffffff810722fd <ist_begin_non_atomic+0x2d>

+sub %rsp,%rax
+cmp $0x3fff,%rax
+ja ffffffff810722fa <ist_begin_non_atomic+0x2a>

Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer
and use it instead of the removed function.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 5ea0727b 14-Jun-2017 Thomas Garnier <thgarnie@google.com>

x86/syscalls: Check address limit on user-mode return

Ensure the address limit is a user-mode segment before returning to
user-mode. Otherwise a process can corrupt kernel-mode memory and elevate
privileges [1].

The set_fs function sets the TIF_SETFS flag to force a slow path on
return. In the slow path, the address limit is checked to be USER_DS if
needed.

The addr_limit_user_check function is added as a cross-architecture
function to check the address limit.

[1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: kernel-hardening@lists.openwall.com
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Will Drewry <wad@chromium.org>
Cc: linux-api@vger.kernel.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170615011203.144108-1-thgarnie@google.com


# 96dc4f9f 16-Feb-2017 Sahara <keun-o.park@darkmatter.ae>

usercopy: Move enum for arch_within_stack_frames()

This patch moves the arch_within_stack_frames() return value enum up in
the header files so that per-architecture implementations can reuse the
same return values.

Signed-off-by: Sahara <keun-o.park@darkmatter.ae>
Signed-off-by: James Morse <james.morse@arm.com>
[kees: adjusted naming and commit log]
Signed-off-by: Kees Cook <keescook@chromium.org>


# e9ea1e7f 20-Mar-2017 Kyle Huey <me@kylehuey.com>

x86/arch_prctl: Add ARCH_[GET|SET]_CPUID

Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
When enabled, the processor will fault on attempts to execute the CPUID
instruction with CPL>0. Exposing this feature to userspace will allow a
ptracer to trap and emulate the CPUID instruction.

When supported, this feature is controlled by toggling bit 0 of
MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of
https://bugzilla.kernel.org/attachment.cgi?id=243991

Implement a new pair of arch_prctls, available on both x86-32 and x86-64.

ARCH_GET_CPUID: Returns the current CPUID state, either 0 if CPUID faulting
is enabled (and thus the CPUID instruction is not available) or 1 if
CPUID faulting is not enabled.

ARCH_SET_CPUID: Set the CPUID state to the second argument. If
cpuid_enabled is 0 CPUID faulting will be activated, otherwise it will
be deactivated. Returns ENODEV if CPUID faulting is not supported on
this system.

The state of the CPUID faulting flag is propagated across forks, but reset
upon exec.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-9-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# afb94c9e 13-Feb-2017 Josh Poimboeuf <jpoimboe@redhat.com>

livepatch/x86: add TIF_PATCH_PENDING thread flag

Add the TIF_PATCH_PENDING thread flag to enable the new livepatch
per-task consistency model for x86_64. The bit getting set indicates
the thread has a pending patch which needs to be applied when the thread
exits the kernel.

The bit is placed in the _TIF_ALLWORK_MASK macro, which results in
exit_to_usermode_loop() calling klp_update_patch_state() when it's set.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org> # for the x86 changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 3a404842 13-Feb-2017 Josh Poimboeuf <jpoimboe@redhat.com>

x86/entry: define _TIF_ALLWORK_MASK flags explicitly

The _TIF_ALLWORK_MASK macro automatically includes the least-significant
16 bits of the thread_info flags, which is less than obvious and tends
to create confusion and surprises when reading or modifying the code.

Define the flags explicitly.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org> # for the x86 changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# c8061485 19-Oct-2016 Heiko Carstens <hca@linux.ibm.com>

sched/core, x86: Make struct thread_info arch specific again

The following commit:

c65eacbe290b ("sched/core: Allow putting thread_info into task_struct")

... made 'struct thread_info' a generic struct with only a
single ::flags member, if CONFIG_THREAD_INFO_IN_TASK_STRUCT=y is
selected.

This change however seems to be quite x86 centric, since at least the
generic preemption code (asm-generic/preempt.h) assumes that struct
thread_info also has a preempt_count member, which apparently was not
true for x86.

We could add a bit more #ifdefs to solve this problem too, but it seems
to be much simpler to make struct thread_info arch specific
again. This also makes the conversion to THREAD_INFO_IN_TASK_STRUCT a
bit easier for architectures that have a couple of arch specific stuff
in their thread_info definition.

The arch specific stuff _could_ be moved to thread_struct. However
keeping them in thread_info makes it easier: accessing thread_info
members is simple, since it is at the beginning of the task_struct,
while the thread_struct is at the end. At least on s390 the offsets
needed to access members of the thread_struct (with task_struct as
base) are too large for various asm instructions. This is not a
problem when keeping these members within thread_info.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: keescook@chromium.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1476901693-8492-2-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 15f4eae7 13-Sep-2016 Andy Lutomirski <luto@kernel.org>

x86: Move thread_info into task_struct

Now that most of the thread_info users have been cleaned up,
this is straightforward.

Most of this code was written by Linus.

Originally-from: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a50eab40abeaec9cb9a9e3cbdeafd32190206654.1473801993.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b9d989c7 13-Sep-2016 Andy Lutomirski <luto@kernel.org>

x86/asm: Move the thread_info::status field to thread_struct

Because sched.h and thread_info.h are a tangled mess, I turned
in_compat_syscall() into a macro. If we had current_thread_struct()
or similar and we could use it from thread_info.h, then this would
be a bit cleaner.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ccc8a1b2f41f9c264a41f771bb4a6539a642ad72.1473801993.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0100301b 12-Aug-2016 Brian Gerst <brgerst@gmail.com>

sched/x86: Rewrite the switch_to() code

Move the low-level context switch code to an out-of-line asm stub instead of
using complex inline asm. This allows constructing a new stack frame for the
child process to make it seamlessly flow to ret_from_fork without an extra
test and branch in __switch_to(). It also improves code generation for
__schedule() by using the C calling convention instead of clobbering all
registers.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1471106302-10159-5-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7e781418 02-Aug-2016 Andy Lutomirski <luto@kernel.org>

signal: consolidate {TS,TLF}_RESTORE_SIGMASK code

In general, there's no need for the "restore sigmask" flag to live in
ti->flags. alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only),
tile, and x86 use essentially identical alternative implementations,
placing the flag in ti->status.

Replace those optimized implementations with an equally good common
implementation that stores it in a bitfield in struct task_struct and
drop the custom implementations.

Additional architectures can opt in by removing their
TIF_RESTORE_SIGMASK defines.

Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 609c19a3 27-Jul-2016 Andy Lutomirski <luto@kernel.org>

x86/ptrace: Stop setting TS_COMPAT in ptrace code

Setting TS_COMPAT in ptrace is wrong: if we happen to do it during
syscall entry, then we'll confuse seccomp and audit. (The former
isn't a security problem: seccomp is currently entirely insecure if a
malicious ptracer is attached.) As a minimal fix, this patch adds a
new flag TS_I386_REGS_POKED that handles the ptrace special case.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5383ebed38b39fa37462139e337aff7f2314d1ca.1469599803.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0f60a8ef 12-Jul-2016 Kees Cook <keescook@chromium.org>

mm: Implement stack frame object validation

This creates per-architecture function arch_within_stack_frames() that
should validate if a given object is contained by a kernel stack frame.
Initial implementation is on x86.

This is based on code from PaX.

Signed-off-by: Kees Cook <keescook@chromium.org>


# 13d4ea09 14-Jul-2016 Andy Lutomirski <luto@kernel.org>

x86/uaccess: Move thread_info::addr_limit to thread_struct

struct thread_info is a legacy mess. To prepare for its partial removal,
move thread_info::addr_limit out.

As an added benefit, this way is simpler.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/15bee834d09402b47ac86f2feccdf6529f9bc5b0.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# dfa9a942 14-Jul-2016 Andy Lutomirski <luto@kernel.org>

x86/uaccess: Move thread_info::uaccess_err and thread_info::sig_on_uaccess_err to thread_struct

struct thread_info is a legacy mess. To prepare for its partial removal,
move the uaccess control fields out -- they're straightforward.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d0ac4d01c8e4d4d756264604e47445d5acc7900e.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# abfb9498 18-Apr-2016 Dmitry Safonov <0x7f454c46@gmail.com>

x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()

The is_ia32_task()/is_x32_task() function names are a big misnomer: they
suggests that the compat-ness of a system call is a task property, which
is not true, the compatness of a system call purely depends on how it
was invoked through the system call layer.

A task may call 32-bit and 64-bit and x32 system calls without changing
any of its kernel visible state.

This specific minomer is also actively dangerous, as it might cause kernel
developers to use the wrong kind of security checks within system calls.

So rename it to in_{ia32,x32}_syscall().

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
[ Expanded the changelog. ]
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1460987025-30360-1-git-send-email-dsafonov@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 5f870a3f 28-Mar-2016 Borislav Petkov <bp@suse.de>

x86/thread_info: Merge two !__ASSEMBLY__ sections

We have

#ifndef __ASSEMBLY__
...
#endif

#ifndef __ASSEMBLY__
...
#endif

Merge the two.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1459189217-25532-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 9999c8c0 09-Mar-2016 Andy Lutomirski <luto@kernel.org>

x86/entry: Call enter_from_user_mode() with IRQs off

Now that slow-path syscalls always enter C before enabling
interrupts, it's straightforward to call enter_from_user_mode() before
enabling interrupts rather than doing it as part of entry tracing.

With this change, we should finally be able to retire exception_enter().

This will also enable optimizations based on knowing that we never
change context tracking state with interrupts on.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/bc376ecf87921a495e874ff98139b1ca2f5c5dd7.1457558566.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 392a6254 09-Mar-2016 Andy Lutomirski <luto@kernel.org>

x86/entry: Remove TIF_SINGLESTEP entry work

Now that SYSENTER with TF set puts X86_EFLAGS_TF directly into
regs->flags, we don't need a TIF_SINGLESTEP fixup in the syscall
entry code. Remove it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2d15f24da52dafc9d2f0b8d76f55544f4779c517.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# cd4d09ec 26-Jan-2016 Borislav Petkov <bp@suse.de>

x86/cpufeature: Carve out X86_FEATURE_*

Move them to a separate header and have the following
dependency:

x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h

This makes it easier to use the header in asm code and not
include the whole cpufeature.h and add guards for asm.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 487e3bf4 05-Oct-2015 Andy Lutomirski <luto@kernel.org>

x86/asm: Remove thread_info.sysenter_return

It's no longer needed.

We could reinstate something like it as an optimization, which
would remove two cachelines from the fast syscall entry working
set. I benchmarked it, and it makes no difference whatsoever to
the performance of cache-hot compat syscalls on Sandy Bridge.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/f08cc0cff30201afe9bb565c47134c0a6c1a96a2.1444091585.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d87b7a33 28-Sep-2015 Peter Zijlstra <peterz@infradead.org>

sched/core, sched/x86: Kill thread_info::saved_preempt_count

With the introduction of the context switch preempt_count invariant,
and the demise of PREEMPT_ACTIVE, its pointless to save/restore the
per-cpu preemption count, it must always be 2.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 88cd622f 31-Jul-2015 Andy Lutomirski <luto@kernel.org>

x86/entry: Remove do_notify_resume(), syscall_trace_leave(), and their TIF masks

They are no longer used. Good riddance!

Deleting the TIF_ macros is really nice. It was never clear why
there were so many variants.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/22c61682f446628573dde0f1d573ab821677e06da.1438378274.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 5ed92a8a 28-Jul-2015 Brian Gerst <brgerst@gmail.com>

x86/vm86: Use the normal pt_regs area for vm86

Change to use the normal pt_regs area to enter and exit vm86
mode. This is done by increasing the padding at the top of the
stack to make room for the extra vm86 segment slots in the IRET
frame. It then saves the 32-bit regs in the off-stack vm86
data, and copies in the vm86 regs. Exiting back to 32-bit mode
does the reverse. This allows removing the hacks to jump
directly into the exit asm code due to having to change the
stack pointer. Returning normally from the vm86 syscall and the
exception handlers allows things like ptrace and auditing to work properly.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438148483-11932-5-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 3a23208e 24-Apr-2015 Denys Vlasenko <dvlasenk@redhat.com>

x86/entry: Define 'cpu_current_top_of_stack' for 64-bit code

32-bit code has PER_CPU_VAR(cpu_current_top_of_stack).
64-bit code uses somewhat more obscure: PER_CPU_VAR(cpu_tss + TSS_sp0).

Define the 'cpu_current_top_of_stack' macro on CONFIG_X86_64
as well so that the PER_CPU_VAR(cpu_current_top_of_stack)
expression can be used in both 32-bit and 64-bit code.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429889495-27850-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# fed7c3f0 24-Apr-2015 Denys Vlasenko <dvlasenk@redhat.com>

x86/entry: Remove unused 'kernel_stack' per-cpu variable

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429889495-27850-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 63332a84 24-Apr-2015 Denys Vlasenko <dvlasenk@redhat.com>

x86/entry: Stop using PER_CPU_VAR(kernel_stack)

PER_CPU_VAR(kernel_stack) is redundant:

- On the 64-bit build, we can use PER_CPU_VAR(cpu_tss + TSS_sp0).
- On the 32-bit build, we can use PER_CPU_VAR(cpu_current_top_of_stack).

PER_CPU_VAR(kernel_stack) will be deleted by a separate change.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429889495-27850-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 3050a35f 13-Jul-2014 Richard Weinberger <richard@nod.at>

x86: Remove signal translation and exec_domain

As execution domain support is gone we can remove
signal translation from the signal code and remove
exec_domain from thread_info.

Signed-off-by: Richard Weinberger <richard@nod.at>


# dca5b52a 24-Mar-2015 Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Rename THREAD_INFO() to ASM_THREAD_INFO()

The THREAD_INFO() macro has a somewhat confusingly generic name,
defined in a generic .h C header file. It also does not make it
clear that it constructs a memory operand for use in assembly
code.

Rename it to ASM_THREAD_INFO() to make it all glaringly
obvious on first glance.

Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/20150324184442.GC14760@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f9d71854 24-Mar-2015 Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Merge the field offset into the THREAD_INFO() macro

Before:

TI_sysenter_return+THREAD_INFO(%rsp,3*8),%r10d

After:

movl THREAD_INFO(TI_sysenter_return, %rsp, 3*8), %r10d

to turn it into a clear thread_info accessor.

No code changed:

md5:
fb4cb2b3ce05d89940ca304efc8ff183 ia32entry.o.before.asm
fb4cb2b3ce05d89940ca304efc8ff183 ia32entry.o.after.asm

e39f2958a5d1300158e276e4f7663263 entry_64.o.before.asm
e39f2958a5d1300158e276e4f7663263 entry_64.o.after.asm

Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/20150324184411.GB14760@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1ddc6f3c 24-Mar-2015 Ingo Molnar <mingo@kernel.org>

x86/asm/entry/64: Improve the THREAD_INFO() macro explanation

Explain the background, and add a real example.

Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/20150324184311.GA14760@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ef593260 19-Mar-2015 Denys Vlasenko <dvlasenk@redhat.com>

x86/asm/entry: Get rid of KERNEL_STACK_OFFSET

PER_CPU_VAR(kernel_stack) was set up in a way where it points
five stack slots below the top of stack.

Presumably, it was done to avoid one "sub $5*8,%rsp"
in syscall/sysenter code paths, where iret frame needs to be
created by hand.

Ironically, none of them benefits from this optimization,
since all of them need to allocate additional data on stack
(struct pt_regs), so they still have to perform subtraction.

This patch eliminates KERNEL_STACK_OFFSET.

PER_CPU_VAR(kernel_stack) now points directly to top of stack.
pt_regs allocations are adjusted to allocate iret frame as well.
Hopefully we can merge it later with 32-bit specific
PER_CPU_VAR(cpu_current_top_of_stack) variable...

Net result in generated code is that constants in several insns
are changed.

This change is necessary for changing struct pt_regs creation
in SYSCALL64 code path from MOV to PUSH instructions.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b3fe8ba3 19-Mar-2015 Denys Vlasenko <dvlasenk@redhat.com>

x86/asm/entry/64: Change the THREAD_INFO() definition to not depend on KERNEL_STACK_OFFSET

This changes the THREAD_INFO() definition and all its callsites
so that they do not count stack position from
(top of stack - KERNEL_STACK_OFFSET), but from top of stack.

Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??"
are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS)
- "calculate thread_info's address using information that
rsp is SIZEOF_PTREGS bytes below top of stack".

While at it, replace "(off)-THREAD_SIZE(reg)" with equivalent
"((off)-THREAD_SIZE)(reg)". The form without parentheses
falsely looks like we invoke THREAD_SIZE() macro.

Improve comment atop THREAD_INFO macro definition.

This patch does not change generated code (verified by objdump).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1daeaa31 21-Mar-2015 Brian Gerst <brgerst@gmail.com>

x86/asm/entry: Fix execve() and sigreturn() syscalls to always return via IRET

Both the execve() and sigreturn() family of syscalls have the
ability to change registers in ways that may not be compatabile
with the syscall path they were called from.

In particular, SYSRET and SYSEXIT can't handle non-default %cs and %ss,
and some bits in eflags.

These syscalls have stubs that are hardcoded to jump to the IRET path,
and not return to the original syscall path.

The following commit:

76f5df43cab5e76 ("Always allocate a complete "struct pt_regs" on the kernel stack")

recently changed this for some 32-bit compat syscalls, but introduced a bug where
execve from a 32-bit program to a 64-bit program would fail because it still returned
via SYSRETL. This caused Wine to fail when built for both 32-bit and 64-bit.

This patch sets TIF_NOTIFY_RESUME for execve() and sigreturn() so
that the IRET path is always taken on exit to userspace.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1426978461-32089-1-git-send-email-brgerst@gmail.com
[ Improved the changelog and comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 3ee4298f 10-Mar-2015 Andy Lutomirski <luto@amacapital.net>

x86/asm/entry: Create and use a 'TOP_OF_KERNEL_STACK_PADDING' macro

x86_32, unlike x86_64, pads the top of the kernel stack, because the
hardware stack frame formats are variable in size.

Document this padding and give it a name.

This should make no change whatsoever to the compiled kernel
image. It also doesn't fix any of the current bugs in this area.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/02bf2f54b8dcb76a62a142b6dfe07d4ef7fc582e.1426009661.git.luto@amacapital.net
[ Fixed small details, such as a missed magic constant in entry_32.S pointed out by Denys Vlasenko. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# a7fcf28d 06-Mar-2015 Andy Lutomirski <luto@amacapital.net>

x86/asm/entry: Replace this_cpu_sp0() with current_top_of_stack() and fix it on x86_32

I broke 32-bit kernels. The implementation of sp0 was correct
as far as I can tell, but sp0 was much weirder on x86_32 than I
realized. It has the following issues:

- Init's sp0 is inconsistent with everything else's: non-init tasks
are offset by 8 bytes. (I have no idea why, and the comment is unhelpful.)

- vm86 does crazy things to sp0.

Fix it up by replacing this_cpu_sp0() with
current_top_of_stack() and using a new percpu variable to track
the top of the stack on x86_32.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 75182b1632a8 ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()")
Link: http://lkml.kernel.org/r/d09dbe270883433776e0cbee3c7079433349e96d.1425692936.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 75182b16 05-Mar-2015 Andy Lutomirski <luto@amacapital.net>

x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()

This will make modifying the semantics of kernel_stack easier.

The change to ist_begin_non_atomic() is necessary because sp0 no
longer points to the same THREAD_SIZE-aligned region as RSP;
it's one byte too high for that. At Denys' suggestion, rather
than offsetting it, just check explicitly that we're in the
correct range ending at sp0. This has the added benefit that we
no longer assume that the thread stack is aligned to
THREAD_SIZE.

Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ef8254ad414cbb8034c9a56396eeb24f5dd5b0de.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f56141e3 12-Feb-2015 Andy Lutomirski <luto@amacapital.net>

all arches, signal: move restart_block to struct task_struct

If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target. This is because the
restart_block is held in the same memory allocation as the kernel stack.

Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.

Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.

It's also a decent simplification, since the restart code is more or less
identical on all architectures.

[james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d4812e16 05-Jan-2015 Tony Luck <tony.luck@intel.com>

x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks

We now switch to the kernel stack when a machine check interrupts
during user mode. This means that we can perform recovery actions
in the tail of do_machine_check()

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>


# 83653c16 13-Nov-2014 Andy Lutomirski <luto@amacapital.net>

x86: Clean up current_stack_pointer

There's no good reason for it to be a macro, and x86_64 will want to
use it, so it should be in a header.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>


# 82975bc6 21-Nov-2014 Andy Lutomirski <luto@amacapital.net>

uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME

x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but
not on non-paranoid returns. I suspect that this is a mistake and that
the code only works because int3 is paranoid.

Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround
for the x86 bug. With that bug fixed, we can remove _TIF_NOTIFY_RESUME
from the uprobes code.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f80c5b39 11-Apr-2014 Peter Zijlstra <peterz@infradead.org>

sched/idle, x86: Switch from TS_POLLING to TIF_POLLING_NRFLAG

Standardize the idle polling indicator to TIF_POLLING_NRFLAG such that
both TIF_NEED_RESCHED and TIF_POLLING_NRFLAG are in the same word.
This will allow us, using fetch_or(), to both set NEED_RESCHED and
check for POLLING_NRFLAG in a single operation and avoid pointless
wakeups.

Changing from the non-atomic thread_info::status flags to the atomic
thread_info::flags shouldn't be a big issue since most polling state
changes were followed/preceded by a full memory barrier anyway.

Also, fix up the apm_32 idle function, clearly that was forgotten in
the last conversion. The default idle state is !POLLING so just kill
the lot.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Link: http://lkml.kernel.org/n/tip-7yksmqtlv4nfowmlqr1rifoi@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 198d208d 06-Feb-2014 Steven Rostedt <srostedt@redhat.com>

x86: Keep thread_info on thread stack in x86_32

x86_64 uses a per_cpu variable kernel_stack to always point to
the thread stack of current. This is where the thread_info is stored
and is accessed from this location even when the irq or exception stack
is in use. This removes the complexity of having to maintain the
thread info on the stack when interrupts are running and having to
copy the preempt_count and other fields to the interrupt stack.

x86_32 uses the old method of copying the thread_info from the thread
stack to the exception stack just before executing the exception.

Having the two different requires #ifdefs and also the x86_32 way
is a bit of a pain to maintain. By converting x86_32 to the same
method of x86_64, we can remove #ifdefs, clean up the x86_32 code
a little, and remove the overhead of the copy.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012354.263834829@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.852942014@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 0788aa6a 06-Feb-2014 Steven Rostedt <srostedt@redhat.com>

x86: Prepare removal of previous_esp from i386 thread_info structure

The i386 thread_info contains a previous_esp field that is used
to daisy chain the different stacks for dump_stack()
(ie. irq, softirq, thread stacks).

The goal is to eventual make i386 handling of thread_info the same
as x86_64, which means that the thread_info will not be in the stack
but as a per_cpu variable. We will no longer depend on thread_info
being able to daisy chain different stacks as it will only exist
in one location (the thread stack).

By moving previous_esp to the end of thread_info and referencing
it as an offset instead of using a thread_info field, this becomes
a stepping stone to moving the thread_info.

The offset to get to the previous stack is rather ugly in this
patch, but this is only temporary and the prev_esp will be changed
in the next commit. This commit is more for sanity checks of the
change.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012353.891757693@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.608754481@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# b807902a 06-Feb-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386

According to a git log -p, GET_THREAD_INFO_WITH_ESP() has only been defined
and never been used. Get rid of it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140206144321.409045251@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 2432e136 06-Feb-2014 Steven Rostedt <srostedt@redhat.com>

x86: Nuke the supervisor_stack field in i386 thread_info

Nothing references the supervisor_stack in the thread_info field,
and it does not exist in x86_64. To make the two more the same,
it is being removed.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012353.546183789@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.203619611@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# dff38e3e 22-Oct-2013 Andi Kleen <ak@linux.intel.com>

x86: Use inline assembler instead of global register variable to get sp

LTO in gcc 4.6/47. has trouble with global register variables. They were used
to read the stack pointer. Use a simple inline assembler statement with
a mov instead.

This also helps LLVM/clang, which does not support global register
variables.

[ hpa: Ideally this should become a builtin in both gcc and clang. ]

v2: More general asm constraint. Fix description (Jan Beulich)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1382458079-24450-6-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 00d1a39e 17-Sep-2013 Thomas Gleixner <tglx@linutronix.de>

preempt: Make PREEMPT_ACTIVE generic

No point in having this bit defined by architecture.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130917183629.090698799@linutronix.de


# c2daa3be 14-Aug-2013 Peter Zijlstra <peterz@infradead.org>

sched, x86: Provide a per-cpu preempt_count implementation

Convert x86 to use a per-cpu preemption count. The reason for doing so
is that accessing per-cpu variables is a lot cheaper than accessing
thread_info variables.

We still need to save/restore the actual preemption count due to
PREEMPT_ACTIVE so we place the per-cpu __preempt_count variable in the
same cache-line as the other hot __switch_to() variables such as
current_task.

NOTE: this save/restore is required even for !PREEMPT kernels as
cond_resched() also relies on preempt_count's PREEMPT_ACTIVE to ignore
task_struct::state.

Also rename thread_info::preempt_count to ensure nobody is
'accidentally' still poking at it.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-gzn5rfsf8trgjoqx8hyayy3q@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 37f07655 03-Jul-2013 Oleg Nesterov <oleg@redhat.com>

x86: kill TIF_DEBUG

Because it is not used.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ee761f62 21-Mar-2013 Thomas Gleixner <tglx@linutronix.de>

arch: Consolidate tsk_is_polling()

Move it to a common place. Preparatory patch for implementing
set/clear for the idle need_resched poll implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130321215233.446034505@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# f431b634 12-Feb-2013 Steven Rostedt <rostedt@goodmis.org>

tracing/syscalls: Allow archs to ignore tracing compat syscalls

The tracing of ia32 compat system calls has been a bit of a pain as they
use different system call numbers than the 64bit equivalents.

I wrote a simple 'lls' program that lists files. I compiled it as a i686
ELF binary and ran it under a x86_64 box. This is the result:

echo 0 > /debug/tracing/tracing_on
echo 1 > /debug/tracing/events/syscalls/enable
echo 1 > /debug/tracing/tracing_on ; ./lls ; echo 0 > /debug/tracing/tracing_on

grep lls /debug/tracing/trace

[.. skipping calls before TS_COMPAT is set ...]

lls-1127 [005] d... 936.409188: sys_recvfrom(fd: 0, ubuf: 4d560fc4, size: 0, flags: 8048034, addr: 8, addr_len: f7700420)
lls-1127 [005] d... 936.409190: sys_recvfrom -> 0x8a77000
lls-1127 [005] d... 936.409211: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)
lls-1127 [005] d... 936.409215: sys_lgetxattr -> 0xf76ff000
lls-1127 [005] d... 936.409223: sys_dup2(oldfd: 4d55ae9b, newfd: 4)
lls-1127 [005] d... 936.409228: sys_dup2 -> 0xfffffffffffffffe
lls-1127 [005] d... 936.409236: sys_newfstat(fd: 4d55b085, statbuf: 80000)
lls-1127 [005] d... 936.409242: sys_newfstat -> 0x3
lls-1127 [005] d... 936.409243: sys_removexattr(pathname: 3, name: ffcd0060)
lls-1127 [005] d... 936.409244: sys_removexattr -> 0x0
lls-1127 [005] d... 936.409245: sys_lgetxattr(pathname: 0, name: 19614, value: 1, size: 2)
lls-1127 [005] d... 936.409248: sys_lgetxattr -> 0xf76e5000
lls-1127 [005] d... 936.409248: sys_newlstat(filename: 3, statbuf: 19614)
lls-1127 [005] d... 936.409249: sys_newlstat -> 0x0
lls-1127 [005] d... 936.409262: sys_newfstat(fd: f76fb588, statbuf: 80000)
lls-1127 [005] d... 936.409279: sys_newfstat -> 0x3
lls-1127 [005] d... 936.409279: sys_close(fd: 3)
lls-1127 [005] d... 936.421550: sys_close -> 0x200
lls-1127 [005] d... 936.421558: sys_removexattr(pathname: 3, name: ffcd00d0)
lls-1127 [005] d... 936.421560: sys_removexattr -> 0x0
lls-1127 [005] d... 936.421569: sys_lgetxattr(pathname: 4d564000, name: 1b1abc, value: 5, size: 802)
lls-1127 [005] d... 936.421574: sys_lgetxattr -> 0x4d564000
lls-1127 [005] d... 936.421575: sys_capget(header: 4d70f000, dataptr: 1000)
lls-1127 [005] d... 936.421580: sys_capget -> 0x0
lls-1127 [005] d... 936.421580: sys_lgetxattr(pathname: 4d710000, name: 3000, value: 3, size: 812)
lls-1127 [005] d... 936.421589: sys_lgetxattr -> 0x4d710000
lls-1127 [005] d... 936.426130: sys_lgetxattr(pathname: 4d713000, name: 2abc, value: 3, size: 32)
lls-1127 [005] d... 936.426141: sys_lgetxattr -> 0x4d713000
lls-1127 [005] d... 936.426145: sys_newlstat(filename: 3, statbuf: f76ff3f0)
lls-1127 [005] d... 936.426146: sys_newlstat -> 0x0
lls-1127 [005] d... 936.431748: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)

Obviously I'm not calling newfstat with a fd of 4d55b085. The calls are
obviously incorrect, and confusing.

Other efforts have been made to fix this:

https://lkml.org/lkml/2012/3/26/367

But the real solution is to rewrite the syscall internals and come up
with a fixed solution. One that doesn't require all the kluge that the
current solution has.

Thus for now, instead of outputting incorrect data, simply ignore them.
With this patch the changes now have:

#> grep lls /debug/tracing/trace
#>

Compat system calls simply are not traced. If users need compat
syscalls, then they should just use the raw syscall tracepoints.

For an architecture to make their compat syscalls ignored, it must
define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS (done in asm/ftrace.h) and also
define an arch_trace_is_compat_syscall() function that will return true
if the current task should ignore tracing the syscall.

I want to stress that this change does not affect actual syscalls in any
way, shape or form. It is only used within the tracing system and
doesn't interfere with the syscall logic at all. The changes are
consolidated nicely into trace_syscalls.c and asm/ftrace.h.

I had to make one small modification to asm/thread_info.h and that was
to remove the include of asm/ftrace.h. As asm/ftrace.h required the
current_thread_info() it was causing include hell. That include was
added back in 2008 when the function graph tracer was added:

commit caf4b323 "tracing, x86: add low level support for ftrace return tracing"

It does not need to be included there.

Link: http://lkml.kernel.org/r/1360703939.21867.99.camel@gandalf.local.home

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# bf5a3c13 11-Jul-2012 Frederic Weisbecker <fweisbec@gmail.com>

x86: Syscall hooks for userspace RCU extended QS

Add syscall slow path hooks to notify syscall entry
and exit on CPUs that want to support userspace RCU
extended quiescent state.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>


# e76623d6 02-Aug-2012 Al Viro <viro@zeniv.linux.org.uk>

x86: get rid of TIF_IRET hackery

TIF_NOTIFY_RESUME will work in precisely the same way; all that
is achieved by TIF_IRET is appearing that there's some work to be
done, so we end up on the iret exit path. Just use NOTIFY_RESUME.
And for execve() do that in 32bit start_thread(), not sys_execve()
itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# edd63a27 27-Apr-2012 Al Viro <viro@zeniv.linux.org.uk>

set_restore_sigmask() is never called without SIGPENDING (and never should be)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 4ebefe3e 26-Apr-2012 Al Viro <viro@zeniv.linux.org.uk>

new helpers: {clear,test,test_and_clear}_restore_sigmask()

helpers parallel to set_restore_sigmask(), used in the next commits

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# c6ae41e7 11-May-2012 Alex Shi <alex.shi@linux.alibaba.com>

x86: replace percpu_xxx funcs with this_cpu_xxx

Since percpu_xxx() serial functions are duplicated with this_cpu_xxx().
Removing percpu_xxx() definition and replacing them by this_cpu_xxx()
in code. There is no function change in this patch, just preparation for
later percpu_xxx serial function removing.

On x86 machine the this_cpu_xxx() serial functions are same as
__this_cpu_xxx() without no unnecessary premmpt enable/disable.

Thanks for Stephen Rothwell, he found and fixed a i386 build error in
the patch.

Also thanks for Andrew Morton, he kept updating the patchset in Linus'
tree.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Christoph Lameter <cl@gentwo.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 38e7c572 05-May-2012 Thomas Gleixner <tglx@linutronix.de>

x86: Use common threadinfo allocator

The only difference is the free_thread_info function, which frees
xstate.

Use the new arch_release_task_struct() function instead and switch
over to the core allocator.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120505150141.559556763@linutronix.de
Cc: x86@kernel.org


# 6c0a9fa6 05-May-2012 Thomas Gleixner <tglx@linutronix.de>

fork: Remove the weak insanity

We error out when compiling with gcc4.1.[01] as it miscompiles
__weak. The workaround with magic defines is not longer
necessary. Make it __weak again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120505150141.306358267@linutronix.de


# 0326f5a9 13-Mar-2012 Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes/core: Handle breakpoint and singlestep exceptions

Uprobes uses exception notifiers to get to know if a thread hit
a breakpoint or a singlestep exception.

When a thread hits a uprobe or is singlestepping post a uprobe
hit, the uprobe exception notifier sets its TIF_UPROBE bit,
which will then be checked on its return to userspace path
(do_notify_resume() ->uprobe_notify_resume()), where the
consumers handlers are run (in task context) based on the
defined filters.

Uprobe hits are thread specific and hence we need to maintain
information about if a task hit a uprobe, what uprobe was hit,
the slot where the original instruction was copied for xol so
that it can be singlestepped with appropriate fixups.

In some cases, special care is needed for instructions that are
executed out of line (xol). These are architecture specific
artefacts, such as handling RIP relative instructions on x86_64.

Since the instruction at which the uprobe was inserted is
executed out of line, architecture specific fixups are added so
that the thread continues normal execution in the presence of a
uprobe.

Postpone the signals until we execute the probed insn.
post_xol() path does a recalc_sigpending() before return to
user-mode, this ensures the signal can't be lost.

Uprobes relies on DIE_DEBUG notification to notify if a
singlestep is complete.

Adds x86 specific uprobe exception notifiers and appropriate
hooks needed to determine a uprobe hit and subsequent post
processing.

Add requisite x86 fixups for xol for uprobes. Specific cases
needing fixups include relative jumps (x86_64), calls, etc.

Where possible, we check and skip singlestepping the
breakpointed instructions. For now we skip single byte as well
as few multibyte nop instructions. However this can be extended
to other instructions too.

Credits to Oleg Nesterov for suggestions/patches related to
signal, breakpoint, singlestep handling code.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120313180011.29771.89027.sendpatchset@srdronam.in.ibm.com
[ Performed various cleanliness edits ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ef334a20 13-Mar-2012 Srikar Dronamraju <srikar@linux.vnet.ibm.com>

x86: Move is_ia32_task to asm/thread_info.h from asm/compat.h

is_ia32_task() is useful even in !CONFIG_COMPAT cases - utrace will
use it for example. Hence move it to a more generic file: asm/thread_info.h

Also now is_ia32_task() returns true if CONFIG_X86_32 is defined.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120313140303.17134.1401.sendpatchset@srdronam.in.ibm.com
[ Performed minor cleanup ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# bb212724 14-Feb-2012 H. Peter Anvin <hpa@zytor.com>

x32: Add a thread flag for x32 processes

An x32 process is *almost* the same thing as a 64-bit process with a
32-bit address limit, but there are a few minor differences -- in
particular core dumps are 32 bits and signal handling is different.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 6bd33008 06-Feb-2012 H. Peter Anvin <hpa@zytor.com>

x86: Factor out TIF_IA32 from 32-bit address space

Factor out IA32 (compatibility instruction set) from 32-bit address
space in the thread_info flags; this is a precondition patch for x32
support.

Originally-by: H. J. Lu <hjl.tools@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-4pr1xnnksprt7t0h3w5fw4rv@git.kernel.org


# f94edacf 17-Feb-2012 Linus Torvalds <torvalds@linux-foundation.org>

i387: move TS_USEDFPU flag from thread_info to task_struct

This moves the bit that indicates whether a thread has ownership of the
FPU from the TS_USEDFPU bit in thread_info->status to a word of its own
(called 'has_fpu') in task_struct->thread.has_fpu.

This fixes two independent bugs at the same time:

- changing 'thread_info->status' from the scheduler causes nasty
problems for the other users of that variable, since it is defined to
be thread-synchronous (that's what the "TS_" part of the naming was
supposed to indicate).

So perfectly valid code could (and did) do

ti->status |= TS_RESTORE_SIGMASK;

and the compiler was free to do that as separate load, or and store
instructions. Which can cause problems with preemption, since a task
switch could happen in between, and change the TS_USEDFPU bit. The
change to TS_USEDFPU would be overwritten by the final store.

In practice, this seldom happened, though, because the 'status' field
was seldom used more than once, so gcc would generally tend to
generate code that used a read-modify-write instruction and thus
happened to avoid this problem - RMW instructions are naturally low
fat and preemption-safe.

- On x86-32, the current_thread_info() pointer would, during interrupts
and softirqs, point to a *copy* of the real thread_info, because
x86-32 uses %esp to calculate the thread_info address, and thus the
separate irq (and softirq) stacks would cause these kinds of odd
thread_info copy aliases.

This is normally not a problem, since interrupts aren't supposed to
look at thread information anyway (what thread is running at
interrupt time really isn't very well-defined), but it confused the
heck out of irq_fpu_usable() and the code that tried to squirrel
away the FPU state.

(It also caused untold confusion for us poor kernel developers).

It also turns out that using 'task_struct' is actually much more natural
for most of the call sites that care about the FPU state, since they
tend to work with the task struct for other reasons anyway (ie
scheduling). And the FPU data that we are going to save/restore is
found there too.

Thanks to Arjan Van De Ven <arjan@linux.intel.com> for pointing us to
the %esp issue.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Reported-and-tested-by: Raphael Prevost <raphael@buro.asia>
Acked-and-tested-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f1044868 10-Jan-2012 Anton Vorontsov <cbouatmailru@gmail.com>

x86: Get rid of dubious one-bit signed bitfield

This very noisy sparse warning appears on almost every file in
the kernel:

CHECK init/main.c
arch/x86/include/asm/thread_info.h:43:55: error: dubious one-bit
signed bitfield arch/x86/include/asm/thread_info.h:44:46: error:
dubious one-bit signed bitfield

Sparse is right and this patch changes sig_on_uaccess_error and
uaccess_err flags to unsigned type and thus fixes the warning.

Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
Acked-by: Andy Lutomirski <luto@mit.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Dan Carpenter <error27@gmail.com>
Link: http://lkml.kernel.org/r/20120111011146.GA30428@oksana.dev.rtsoft.ru
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# bccd1729 10-Jan-2012 Anton Vorontsov <cbouatmailru@gmail.com>

x86: Get rid of 'dubious one-bit signed bitfield' sprase warning

This very noisy sparse warning appears on almost every file in the
kernel:

CHECK init/main.c
arch/x86/include/asm/thread_info.h:43:55: error: dubious one-bit signed bitfield
arch/x86/include/asm/thread_info.h:44:46: error: dubious one-bit signed bitfield

This patch changes sig_on_uaccess_error and uaccess_err flags to unsigned
type and thus fixes the warning.

Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
Acked-by: Andy Lutomirski <luto@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 46db09d3 29-Nov-2011 Jan Beulich <JBeulich@suse.com>

x86-64: Slightly shorten line system call entry and exit paths

GET_THREAD_INFO() involves a memory read immediately followed by
an "sub" on the value read, in turn (in several cases)
immediately followed by a use of the calculated value as the
base address of a memory access. This combination of
instructions has a non-negligible potential for stalls.

In the system call entry point code, however, the (fixed) offset
of the stack pointer from the end of the stack is generally
known, and hence we can instead avoid the memory load and
subtract, and instead do the memory reference using %rsp as the
base register. To do so in a legible fashion, introduce a
THREAD_INFO() macro which, provided a register (generally %rsp)
and the known offset from the end of the stack, produces a
suitable memory access operand.

The patch attempts to only touch the fast paths (no auditing and
alike), but manages to do so only in the 64-bit entry point
case; the compatibility mode entry points have so many
interdependencies between their various branch targets that it
was necessary to also adjust the slow paths to eliminate the
risk of having missed some register dependency during code
analysis.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/4ED4CD690200007800064075@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 4fc34901 07-Nov-2011 Andy Lutomirski <luto@amacapital.net>

x86-64: Set siginfo and context on vsyscall emulation faults

To make this work, we teach the page fault handler how to send
signals on failed uaccess. This only works for user addresses
(kernel addresses will never hit the page fault handler in the
first place), so we need to generate signals for those
separately.

This gets the tricky case right: if the user buffer spans
multiple pages and only the second page is invalid, we set
cr2 and si_addr correctly. UML relies on this behavior to
"fault in" pages as needed.

We steal a bit from thread_info.uaccess_err to enable this.
Before this change, uaccess_err was a 32-bit boolean value.

This fixes issues with UML when vsyscall=emulate.

Reported-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/4c8f91de7ec5cd2ef0f59521a04e1015f11e42b4.1320712291.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# d88e4cb6 21-Nov-2011 Tejun Heo <tj@kernel.org>

freezer: remove now unused TIF_FREEZE

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org


# 60063497 26-Jul-2011 Arun Sharma <asharma@fb.com>

atomic: use <linux/atomic.h>

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b6a84016 22-Mar-2011 Eric Dumazet <eric.dumazet@gmail.com>

mm: NUMA aware alloc_thread_info_node()

Add a node parameter to alloc_thread_info(), and change its name to
alloc_thread_info_node()

This change is needed to allow NUMA aware kthread_create_on_cpu()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e45b7fa2 24-May-2010 Len Brown <len.brown@intel.com>

sched: clarify commment for TS_POLLING

TS_POLLING set tells the scheduler an idle_task will poll
need_resched() to look for work.

TS_POLLING clear tells resched_task() and wake_up_idle_cpu()
that the remote CPU's idle_task is now sleeping in idle,
and thus requires a reschedule interrupt notice work.

Update the description of TS_POLLING to reflect how it works.
"idle task polling need_resched, skip sending interrupt"

Wordsmithing-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>


# 0ddc9324 14-May-2010 Andreas Dilger <adilger@dilger.ca>

add descriptive comment for TIF_MEMDIE task flag declaration.

Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# c9ad4882 06-May-2010 Avi Kivity <avi@redhat.com>

x86: Eliminate TS_XSAVE

The fpu code currently uses current->thread_info->status & TS_XSAVE as
a way to distinguish between XSAVE capable processors and older processors.
The decision is not really task specific; instead we use the task status to
avoid a global memory reference - the value should be the same across all
threads.

Eliminate this tie-in into the task structure by using an alternative
instruction keyed off the XSAVE cpu feature; this results in shorter and
faster code, without introducing a global memory reference.

[ hpa: in the future, this probably should use an asm jmp ]

Signed-off-by: Avi Kivity <avi@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# ea8e61b7 25-Mar-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

x86, ptrace: Fix block-step

Implement ptrace-block-step using TIF_BLOCKSTEP which will set
DEBUGCTLMSR_BTF when set for a task while preserving any other
DEBUGCTLMSR bits.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100325135414.017536066@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# faa4602e 25-Mar-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

x86, perf, bts, mm: Delete the never used BTS-ptrace code

Support for the PMU's BTS features has been upstreamed in
v2.6.32, but we still have the old and disabled ptrace-BTS,
as Linus noticed it not so long ago.

It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
regard for other uses (perf) and doesn't provide the flexibility
needed for perf either.

Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
was never used and ptrace-block-step can be implemented using a
much simpler approach.

So axe all 3000 lines of it. That includes the *locked_memory*()
APIs in mm/mlock.c as well.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Markus Metzger <markus.t.metzger@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20100325135413.938004390@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 05d43ed8 28-Jan-2010 H. Peter Anvin <hpa@zytor.com>

x86: get rid of the insane TIF_ABI_PENDING bit

Now that the previous commit made it possible to do the personality
setting at the point of no return, we do just that for ELF binaries.
And suddenly all the reasons for that insane TIF_ABI_PENDING bit go
away, and we can just make SET_PERSONALITY() just do the obvious thing
for a 32-bit compat process.

Everything becomes much more straightforward this way.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7c68af6e 19-Sep-2009 Avi Kivity <avi@redhat.com>

core, x86: Add user return notifiers

Add a general per-cpu notifier that is called whenever the kernel is
about to return to userspace. The notifier uses a thread_info flag
and existing checks, so there is no impact on user return or context
switch fast paths.

This will be used initially to speed up KVM task switching by lazily
updating MSRs.

Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1253342422-13811-1-git-send-email-avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 66700001 24-Aug-2009 Josh Stone <jistone@redhat.com>

tracing: Rename FTRACE_SYSCALLS for tracepoints

s/HAVE_FTRACE_SYSCALLS/HAVE_SYSCALL_TRACEPOINTS/g
s/TIF_SYSCALL_FTRACE/TIF_SYSCALL_TRACEPOINT/g

The syscall enter/exit tracing is no longer specific to just ftrace, so
they now have names that reflect their tie to tracepoints instead.

Signed-off-by: Josh Stone <jistone@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <1251150194-1713-2-git-send-email-jistone@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>


# ed8d9adf 02-Aug-2009 Linus Torvalds <torvalds@linux-foundation.org>

x86, percpu: Add 'percpu_read_stable()' interface for cacheable accesses

This is very useful for some common things like 'get_current()' and
'get_thread_info()', which can be used multiple times in a function, and
where the result is cacheable.

tj: Added the magical undocumented "P" modifier to UP __percpu_arg()
to force gcc to dereference the pointer value passed in via the
"p" input constraint. Without this, percpu_read_stable() returns
the address of the percpu variable. Also added comment explaining
the difference between percpu_read() and percpu_read_stable().

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# c99e6efe 10-Jul-2009 Peter Zijlstra <a.p.zijlstra@chello.nl>

sched: INIT_PREEMPT_COUNT

Pull the initial preempt_count value into a single
definition site.

Maintainers for: alpha, ia64 and m68k, please have a look,
your arch code is funny.

The header magic is a bit odd, but similar to the KERNEL_DS
one, CPP waits with expanding these macros until the
INIT_THREAD_INFO macro itself is expanded, which is in
arch/*/kernel/init_task.c where we've already included
sched.h so we're good.

Cc: tony.luck@intel.com
Cc: rth@twiddle.net
Cc: geert@linux-m68k.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b1eeab67 25-Nov-2008 Vegard Nossum <vegard.nossum@gmail.com>

kmemcheck: add hooks for the page allocator

This adds support for tracking the initializedness of memory that
was allocated with the page allocator. Highmem requests are not
tracked.

Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>

[build fix for !CONFIG_KMEMCHECK]
Signed-off-by: Ingo Molnar <mingo@elte.hu>

[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>


# 925d519a 30-Mar-2009 Peter Zijlstra <a.p.zijlstra@chello.nl>

perf_counter: unify and fix delayed counter wakeup

While going over the wakeup code I noticed delayed wakeups only work
for hardware counters but basically all software counters rely on
them.

This patch unifies and generalizes the delayed wakeup to fix this
issue.

Since we're dealing with NMI context bits here, use a cmpxchg() based
single link list implementation to track counters that have pending
wakeups.

[ This should really be generic code for delayed wakeups, but since we
cannot use cmpxchg()/xchg() in generic code, I've let it live in the
perf_counter code. -- Eric Dumazet could use it to aggregate the
network wakeups. ]

Furthermore, the x86 method of using TIF flags was flawed in that its
quite possible to end up setting the bit on the idle task, loosing the
wakeup.

The powerpc method uses per-cpu storage and does appear to be
sufficient.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Orig-LKML-Reference: <20090330171023.153932974@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b407fc57 18-Feb-2009 Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

x86/paravirt: flush pending mmu updates on context switch

Impact: allow preemption during lazy mmu updates

If we're in lazy mmu mode when context switching, leave
lazy mmu mode, but remember the task's state in
TIF_LAZY_MMU_UPDATES. When we resume the task, check this
flag and re-enter lazy mmu mode if its set.

This sets things up for allowing lazy mmu mode while preemptible,
though that won't actually be active until the next change.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>


# 1b3fa2ce 06-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com>

tracing/x86: basic implementation of syscall tracing for x86

Provide the x86 trace callbacks to trace syscalls.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <1236401580-5758-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fe40c0af 23-Jan-2009 Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>

x86: uaccess: introduce try and catch framework

Impact: introduce new uaccess exception handling framework

Introduce {get|put}_user_try and {get|put}_user_catch as new uaccess exception
handling framework.
{get|put}_user_try begins exception block and {get|put}_user_catch(err) ends
the block and gets err if an exception occured in {get|put}_user_ex() in the
block. The exception is stored thread_info->uaccess_err.

The example usage of this framework is below;
int func()
{
int err = 0;

get_user_try {
get_user_ex(...);
get_user_ex(...);
:
} get_user_catch(err);

return err;
}

Note: get_user_ex() is not clear the value when an exception occurs, it's
different from the behavior of __get_user(), but I think it doesn't matter.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 9af45651 18-Jan-2009 Brian Gerst <brgerst@gmail.com>

x86-64: Move kernelstack from PDA to per-cpu.

Also clean up PER_CPU_VAR usage in xen-asm_64.S

tj: * remove now unused stack_thread_info()
* s/kernelstack/kernel_stack/
* added FIXME comment in xen-asm_64.S

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# c2724775 11-Dec-2008 Markus Metzger <markus.t.metzger@intel.com>

x86, bts: provide in-kernel branch-trace interface

Impact: cleanup

Move the BTS bits from ptrace.c into ds.c.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 241771ef 03-Dec-2008 Ingo Molnar <mingo@elte.hu>

performance counters: x86 support

Implement performance counters for x86 Intel CPUs.

It's simplified right now: the PERFMON CPU feature is assumed,
which is available in Core2 and later Intel CPUs.

The design is flexible to be extended to more CPU types as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# affa219b 03-Dec-2008 Joe Korty <joe.korty@ccur.com>

x86: change thread_info's flag field back to 32 bits

Impact: pack struct thread_info more tightly

Change x86_64's thread_info 'flags' field back to __u32.

This was changed to 'unsigned long' when the thread_info*.h
for i386 and x86_64 were merged. Change it back. We can
do this as only 27 bits of 'flags' are actually used.

This change actually packs down thread_info by 64 bits:
32 bits are saved by the smaller flags, and 32 bits are
saved by the following 'mm_segment_t field' becoming
naturally 64-bit aligned.

Signed-off-by: Joe Korty <joe.korty@ccur.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f201ae23 22-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-return-tracer: store return stack into task_struct and allocate it dynamically

Impact: use deeper function tracing depth safely

Some tests showed that function return tracing needed a more deeper depth
of function calls. But it could be unsafe to store these return addresses
to the stack.

So these arrays will now be allocated dynamically into task_struct of current
only when the tracer is activated.

Typical scheme when tracer is activated:
- allocate a return stack for each task in global list.
- fork: allocate the return stack for the newly created task
- exit: free return stack of current
- idle init: same as fork

I chose a default depth of 50. I don't have overruns anymore.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0231022c 16-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-return-tracer: add the overrun field

Impact: help to find the better depth of trace

We decided to arbitrary define the depth of function return trace as
"20". Perhaps this is not enough. To help finding an optimal depth, we
measure now the overrun: the number of functions that have been missed
for the current thread. By default this is not displayed, we have to
do set a particular flag on the return tracer: echo overrun >
/debug/tracing/trace_options And the overrun will be printed on the
right.

As the trace shows below, the current 20 depth is not enough.

update_wall_time+0x37f/0x8c0 -> update_xtime_cache (345 ns) (Overruns: 2838)
update_wall_time+0x384/0x8c0 -> clocksource_get_next (1141 ns) (Overruns: 2838)
do_timer+0x23/0x100 -> update_wall_time (3882 ns) (Overruns: 2838)
tick_do_update_jiffies64+0xbf/0x160 -> do_timer (5339 ns) (Overruns: 2838)
tick_sched_timer+0x6a/0xf0 -> tick_do_update_jiffies64 (7209 ns) (Overruns: 2838)
vgacon_set_cursor_size+0x98/0x120 -> native_io_delay (2613 ns) (Overruns: 274)
vgacon_cursor+0x16e/0x1d0 -> vgacon_set_cursor_size (33151 ns) (Overruns: 274)
set_cursor+0x5f/0x80 -> vgacon_cursor (36432 ns) (Overruns: 274)
con_flush_chars+0x34/0x40 -> set_cursor (38790 ns) (Overruns: 274)
release_console_sem+0x1ec/0x230 -> up (721 ns) (Overruns: 274)
release_console_sem+0x225/0x230 -> wake_up_klogd (316 ns) (Overruns: 274)
con_flush_chars+0x39/0x40 -> release_console_sem (2996 ns) (Overruns: 274)
con_write+0x22/0x30 -> con_flush_chars (46067 ns) (Overruns: 274)
n_tty_write+0x1cc/0x360 -> con_write (292670 ns) (Overruns: 274)
smp_apic_timer_interrupt+0x2a/0x90 -> native_apic_mem_write (330 ns) (Overruns: 274)
irq_enter+0x17/0x70 -> idle_cpu (413 ns) (Overruns: 274)
smp_apic_timer_interrupt+0x2f/0x90 -> irq_enter (1525 ns) (Overruns: 274)
ktime_get_ts+0x40/0x70 -> getnstimeofday (465 ns) (Overruns: 274)
ktime_get_ts+0x60/0x70 -> set_normalized_timespec (436 ns) (Overruns: 274)
ktime_get+0x16/0x30 -> ktime_get_ts (2501 ns) (Overruns: 274)
hrtimer_interrupt+0x77/0x1a0 -> ktime_get (3439 ns) (Overruns: 274)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# caf4b323 10-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing, x86: add low level support for ftrace return tracing

Impact: add infrastructure for function-return tracing

Add low level support for ftrace return tracing.

This plug-in stores return addresses on the thread_info structure of
the current task.

The index of the current return address is initialized when the task
is the first one (init) and when a process forks (the child). It is
not needed when a task does a sys_execve because after this syscall,
it still needs to return on the kernel functions it called.

Note that the code of return_to_handler has been suggested by Steven
Rostedt as almost all of the ideas of improvements in this V3.

For purpose of security, arch/x86/kernel/process_32.c is not traced
because __switch_to() changes the current task during its execution.
That could cause inconsistency in the stored return address of this
function even if I didn't have any crash after testing with tracing on
this function enabled.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1965aae3 22-Oct-2008 H. Peter Anvin <hpa@zytor.com>

x86: Fix ASM_X86__ header guards

Change header guards named "ASM_X86__*" to "_ASM_X86_*" since:

a. the double underscore is ugly and pointless.
b. no leading underscore violates namespace constraints.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# bb898558 17-Aug-2008 Al Viro <viro@zeniv.linux.org.uk>

x86, um: ... and asm-x86 move

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>