History log of /linux-master/arch/x86/Kconfig
Revision Date Author Comments
# 4f511739 10-Apr-2024 Josh Poimboeuf <jpoimboe@kernel.org>

x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI

For consistency with the other CONFIG_MITIGATION_* options, replace the
CONFIG_SPECTRE_BHI_{ON,OFF} options with a single
CONFIG_MITIGATION_SPECTRE_BHI option.

[ mingo: Fix ]

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/3833812ea63e7fdbe36bf8b932e63f70d18e2a2a.1712813475.git.jpoimboe@kernel.org


# 36d4fe14 10-Apr-2024 Josh Poimboeuf <jpoimboe@kernel.org>

x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto

Unlike most other mitigations' "auto" options, spectre_bhi=auto only
mitigates newer systems, which is confusing and not particularly useful.

Remove it.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.1712813475.git.jpoimboe@kernel.org


# ec9404e4 11-Mar-2024 Pawan Gupta <pawan.kumar.gupta@linux.intel.com>

x86/bhi: Add BHI mitigation knob

Branch history clearing software sequences and hardware control
BHI_DIS_S were defined to mitigate Branch History Injection (BHI).

Add cmdline spectre_bhi={on|off|auto} to control BHI mitigation:

auto - Deploy the hardware mitigation BHI_DIS_S, if available.
on - Deploy the hardware mitigation BHI_DIS_S, if available,
otherwise deploy the software sequence at syscall entry and
VMexit.
off - Turn off BHI mitigation.

The default is auto mode which does not deploy the software sequence
mitigation. This is because of the hardening done in the syscall
dispatch path, which is the likely target of BHI.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>


# b6540de9 24-Mar-2024 Uros Bizjak <ubizjak@gmail.com>

x86/percpu: Disable named address spaces for KCSAN

-fsanitize=thread (KCSAN) is at the moment incompatible
with named address spaces in a similar way as KASAN -
see GCC PR sanitizer/111736:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736

The patch disables named address spaces with KCSAN.

Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240325110128.615933-1-ubizjak@gmail.com


# 8af2d202 04-Feb-2024 Masahiro Yamada <masahiroy@kernel.org>

platform: goldfish: move the separate 'default' propery for CONFIG_GOLDFISH

Currently, there are two entries for CONFIG_GOLDFISH.

In arch/x86/Kconfig:

config GOLDFISH
def_bool y
depends on X86_GOLDFISH

In drivers/platform/goldfish/Kconfig:

menuconfig GOLDFISH
bool "Platform support for Goldfish virtual devices"
depends on HAS_IOMEM && HAS_DMA

While Kconfig allows multiple entries, it generally leads to tricky
code.

Prior to commit bd2f348db503 ("goldfish: refactor goldfish platform
configs"), CONFIG_GOLDFISH was an alias of CONFIG_X86_GOLDFISH.

After the mentioned commit added the second entry with a user prompt,
the former provides the 'default' property that is effective only when
X86_GOLDFISH=y.

Merge them tegether to clarify how it has worked in the past 8 years.

Cc: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20240204081004.33871-1-masahiroy@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# f48212ee 04-Jan-2024 Paolo Bonzini <pbonzini@redhat.com>

treewide: remove CONFIG_HAVE_KVM

It has no users anymore.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 85fcde40 23-Jan-2024 Baoquan He <bhe@redhat.com>

kexec: split crashkernel reservation code out from crash_core.c

Patch series "Split crash out from kexec and clean up related config
items", v3.

Motivation:
=============
Previously, LKP reported a building error. When investigating, it can't
be resolved reasonablly with the present messy kdump config items.

https://lore.kernel.org/oe-kbuild-all/202312182200.Ka7MzifQ-lkp@intel.com/

The kdump (crash dumping) related config items could causes confusions:

Firstly,

CRASH_CORE enables codes including
- crashkernel reservation;
- elfcorehdr updating;
- vmcoreinfo exporting;
- crash hotplug handling;

Now fadump of powerpc, kcore dynamic debugging and kdump all selects
CRASH_CORE, while fadump
- fadump needs crashkernel parsing, vmcoreinfo exporting, and accessing
global variable 'elfcorehdr_addr';
- kcore only needs vmcoreinfo exporting;
- kdump needs all of the current kernel/crash_core.c.

So only enabling PROC_CORE or FA_DUMP will enable CRASH_CORE, this
mislead people that we enable crash dumping, actual it's not.

Secondly,

It's not reasonable to allow KEXEC_CORE select CRASH_CORE.

Because KEXEC_CORE enables codes which allocate control pages, copy
kexec/kdump segments, and prepare for switching. These codes are
shared by both kexec reboot and kdump. We could want kexec reboot,
but disable kdump. In that case, CRASH_CORE should not be selected.

--------------------
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
---------------------

Thirdly,

It's not reasonable to allow CRASH_DUMP select KEXEC_CORE.

That could make KEXEC_CORE, CRASH_DUMP are enabled independently from
KEXEC or KEXEC_FILE. However, w/o KEXEC or KEXEC_FILE, the KEXEC_CORE
code built in doesn't make any sense because no kernel loading or
switching will happen to utilize the KEXEC_CORE code.
---------------------
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_CRASH_DUMP=y
---------------------

In this case, what is worse, on arch sh and arm, KEXEC relies on MMU,
while CRASH_DUMP can still be enabled when !MMU, then compiling error is
seen as the lkp test robot reported in above link.

------arch/sh/Kconfig------
config ARCH_SUPPORTS_KEXEC
def_bool MMU

config ARCH_SUPPORTS_CRASH_DUMP
def_bool BROKEN_ON_SMP
---------------------------

Changes:
===========
1, split out crash_reserve.c from crash_core.c;
2, split out vmcore_infoc. from crash_core.c;
3, move crash related codes in kexec_core.c into crash_core.c;
4, remove dependency of FA_DUMP on CRASH_DUMP;
5, clean up kdump related config items;
6, wrap up crash codes in crash related ifdefs on all 8 arch-es
which support crash dumping, except of ppc;

Achievement:
===========
With above changes, I can rearrange the config item logic as below (the right
item depends on or is selected by the left item):

PROC_KCORE -----------> VMCORE_INFO

|----------> VMCORE_INFO
FA_DUMP----|
|----------> CRASH_RESERVE

---->VMCORE_INFO
/
|---->CRASH_RESERVE
KEXEC --| /|
|--> KEXEC_CORE--> CRASH_DUMP-->/-|---->PROC_VMCORE
KEXEC_FILE --| \ |
\---->CRASH_HOTPLUG


KEXEC --|
|--> KEXEC_CORE (for kexec reboot only)
KEXEC_FILE --|

Test
========
On all 8 architectures, including x86_64, arm64, s390x, sh, arm, mips,
riscv, loongarch, I did below three cases of config item setting and
building all passed. Take configs on x86_64 as exampmle here:

(1) Both CONFIG_KEXEC and KEXEC_FILE is unset, then all kexec/kdump
items are unset automatically:
# Kexec and crash features
# CONFIG_KEXEC is not set
# CONFIG_KEXEC_FILE is not set
# end of Kexec and crash features

(2) set CONFIG_KEXEC_FILE and 'make olddefconfig':
---------------
# Kexec and crash features
CONFIG_CRASH_RESERVE=y
CONFIG_VMCORE_INFO=y
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC_FILE=y
CONFIG_CRASH_DUMP=y
CONFIG_CRASH_HOTPLUG=y
CONFIG_CRASH_MAX_MEMORY_RANGES=8192
# end of Kexec and crash features
---------------

(3) unset CONFIG_CRASH_DUMP in case 2 and execute 'make olddefconfig':
------------------------
# Kexec and crash features
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC_FILE=y
# end of Kexec and crash features
------------------------

Note:
For ppc, it needs investigation to make clear how to split out crash
code in arch folder. Hope Hari and Pingfan can help have a look, see if
it's doable. Now, I make it either have both kexec and crash enabled, or
disable both of them altogether.


This patch (of 14):

Both kdump and fa_dump of ppc rely on crashkernel reservation. Move the
relevant codes into separate files: crash_reserve.c,
include/linux/crash_reserve.h.

And also add config item CRASH_RESERVE to control its enabling of the
codes. And update config items which has relationship with crashkernel
reservation.

And also change ifdeffery from CONFIG_CRASH_CORE to CONFIG_CRASH_RESERVE
when those scopes are only crashkernel reservation related.

And also rename arch/XXX/include/asm/{crash_core.h => crash_reserve.h} on
arm64, x86 and risc-v because those architectures' crash_core.h is only
related to crashkernel reservation.

[akpm@linux-foundation.org: s/CRASH_RESEERVE/CRASH_RESERVE/, per Klara Modin]
Link: https://lkml.kernel.org/r/20240124051254.67105-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20240124051254.67105-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pingfan Liu <piliu@redhat.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 3598e577 19-Jan-2024 Meng Li <li.meng@amd.com>

x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion

amd-pstate driver also uses SCHED_MC_PRIO, so decouple the requirement
of CPU_SUP_INTEL from the dependencies to allow compilation in kernels
without Intel CPU support.

Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Meng Li <li.meng@amd.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8076fcde 11-Mar-2024 Pawan Gupta <pawan.kumar.gupta@linux.intel.com>

x86/rfds: Mitigate Register File Data Sampling (RFDS)

RFDS is a CPU vulnerability that may allow userspace to infer kernel
stale data previously used in floating point registers, vector registers
and integer registers. RFDS only affects certain Intel Atom processors.

Intel released a microcode update that uses VERW instruction to clear
the affected CPU buffers. Unlike MDS, none of the affected cores support
SMT.

Add RFDS bug infrastructure and enable the VERW based mitigation by
default, that clears the affected buffers just before exiting to
userspace. Also add sysfs reporting and cmdline parameter
"reg_file_data_sampling" to control the mitigation.

For details see:
Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>


# 5394f1e9 26-Feb-2024 Arnd Bergmann <arnd@arndb.de>

arch: define CONFIG_PAGE_SIZE_*KB on all architectures

Most architectures only support a single hardcoded page size. In order
to ensure that each one of these sets the corresponding Kconfig symbols,
change over the PAGE_SHIFT definition to the common one and allow
only the hardware page size to be selected.

Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# cb81deef 28-Feb-2024 Thomas Gleixner <tglx@linutronix.de>

x86/idle: Sanitize X86_BUG_AMD_E400 handling

amd_e400_idle(), the idle routine for AMD CPUs which are affected by
erratum 400 violates the RCU constraints by invoking tick_broadcast_enter()
and tick_broadcast_exit() after the core code has marked RCU non-idle. The
functions can end up in lockdep or tracing, which rightfully triggers a
RCU warning.

The core code provides now a static branch conditional invocation of the
broadcast functions.

Remove amd_e400_idle(), enforce default_idle() and enable the static branch
on affected CPUs to cure this.

[ bp: Fold in a fix for a IS_ENABLED() check fail missing a "CONFIG_"
prefix which tglx spotted. ]

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/877cim6sis.ffs@tglx


# 43b1d3e6 15-Dec-2023 Chris Koch <chrisko@google.com>

kexec: Allocate kernel above bzImage's pref_address

A relocatable kernel will relocate itself to pref_address if it is
loaded below pref_address. This means a booted kernel may be relocating
itself to an area with reserved memory on modern systems, potentially
clobbering arbitrary data that may be important to the system.

This is often the case, as the default value of PHYSICAL_START is
0x1000000 and kernels are typically loaded at 0x100000 or above by
bootloaders like iPXE or kexec. GRUB behaves like the approach
implemented here.

Also fixes the documentation around pref_address and PHYSICAL_START to
be accurate.

[ dhansen: changelog tweak ]

Co-developed-by: Cloud Hsu <cloudhsu@google.com>
Signed-off-by: Cloud Hsu <cloudhsu@google.com>
Signed-off-by: Chris Koch <chrisko@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Link: https://lore.kernel.org/all/20231215190521.3796022-1-chrisko%40google.com


# 918327e9 28-Jan-2024 Kees Cook <keescook@chromium.org>

ubsan: Remove CONFIG_UBSAN_SANITIZE_ALL

For simplicity in splitting out UBSan options into separate rules,
remove CONFIG_UBSAN_SANITIZE_ALL, effectively defaulting to "y", which
is how it is generally used anyway. (There are no ":= y" cases beyond
where a specific file is enabled when a top-level ":= n" is in effect.)

Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kbuild@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>


# 29956748 02-Feb-2024 Borislav Petkov (AMD) <bp@alien8.de>

x86/Kconfig: Remove CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT

It was meant well at the time but nothing's using it so get rid of it.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240202163510.GDZb0Zvj8qOndvFOiZ@fat_crate.local


# 2cce9591 05-Dec-2023 H. Peter Anvin (Intel) <hpa@zytor.com>

x86/fred: Add Kconfig option for FRED (CONFIG_X86_FRED)

Add the configuration option CONFIG_X86_FRED to enable FRED.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Link: https://lore.kernel.org/r/20231205105030.8698-6-xin3.li@intel.com


# 0911b8c5 21-Nov-2023 Breno Leitao <leitao@debian.org>

x86/bugs: Rename CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK

Step 10/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Added one more case. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-11-leitao@debian.org


# a033eec9 21-Nov-2023 Breno Leitao <leitao@debian.org>

x86/bugs: Rename CONFIG_CPU_SRSO => CONFIG_MITIGATION_SRSO

Step 9/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-10-leitao@debian.org


# 1da8d217 21-Nov-2023 Breno Leitao <leitao@debian.org>

x86/bugs: Rename CONFIG_CPU_IBRS_ENTRY => CONFIG_MITIGATION_IBRS_ENTRY

Step 8/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-9-leitao@debian.org


# ac61d439 21-Nov-2023 Breno Leitao <leitao@debian.org>

x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRY

Step 7/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-8-leitao@debian.org


# 7b75782f 21-Nov-2023 Breno Leitao <leitao@debian.org>

x86/bugs: Rename CONFIG_SLS => CONFIG_MITIGATION_SLS

Step 6/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-7-leitao@debian.org


# aefb2f2e 21-Nov-2023 Breno Leitao <leitao@debian.org>

x86/bugs: Rename CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINE

Step 5/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Converted a few more uses in comments/messages as well. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ariel Miculas <amiculas@cisco.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-6-leitao@debian.org


# ea4654e0 21-Nov-2023 Breno Leitao <leitao@debian.org>

x86/bugs: Rename CONFIG_PAGE_TABLE_ISOLATION => CONFIG_MITIGATION_PAGE_TABLE_ISOLATION

Step 4/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Converted new uses that got added since the series was posted. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-5-leitao@debian.org


# 5fa31af3 21-Nov-2023 Breno Leitao <leitao@debian.org>

x86/bugs: Rename CONFIG_CALL_DEPTH_TRACKING => CONFIG_MITIGATION_CALL_DEPTH_TRACKING

Step 3/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-4-leitao@debian.org


# e0b8fcfa 21-Nov-2023 Breno Leitao <leitao@debian.org>

x86/bugs: Rename CONFIG_CPU_IBPB_ENTRY => CONFIG_MITIGATION_IBPB_ENTRY

Step 2/10 of the namespace unification of CPU mitigations related Kconfig options.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-3-leitao@debian.org


# be83e809 21-Nov-2023 Breno Leitao <leitao@debian.org>

x86/bugs: Rename CONFIG_GDS_FORCE_MITIGATION => CONFIG_MITIGATION_GDS_FORCE

So the CPU mitigations Kconfig entries - there's 10 meanwhile - are named
in a historically idiosyncratic and hence rather inconsistent fashion
and have become hard to relate with each other over the years:

https://lore.kernel.org/lkml/20231011044252.42bplzjsam3qsasz@treble/

When they were introduced we never expected that we'd eventually have
about a dozen of them, and that more organization would be useful,
especially for Linux distributions that want to enable them in an
informed fashion, and want to make sure all mitigations are configured
as expected.

For example, the current CONFIG_SPECULATION_MITIGATIONS namespace is only
halfway populated, where some mitigations have entries in Kconfig, and
they could be modified, while others mitigations do not have Kconfig entries,
and can not be controlled at build time.

Fine-grained control over these Kconfig entries can help in a number of ways:

1) Users can choose and pick only mitigations that are important for
their workloads.

2) Users and developers can choose to disable mitigations that mangle
the assembly code generation, making it hard to read.

3) Separate Kconfigs for just source code readability,
so that we see *which* butt-ugly piece of crap code is for what
reason...

In most cases, if a mitigation is disabled at compilation time, it
can still be enabled at runtime using kernel command line arguments.

This is the first patch of an initial series that renames various
mitigation related Kconfig options, unifying them under a single
CONFIG_MITIGATION_* namespace:

CONFIG_GDS_FORCE_MITIGATION => CONFIG_MITIGATION_GDS_FORCE
CONFIG_CPU_IBPB_ENTRY => CONFIG_MITIGATION_IBPB_ENTRY
CONFIG_CALL_DEPTH_TRACKING => CONFIG_MITIGATION_CALL_DEPTH_TRACKING
CONFIG_PAGE_TABLE_ISOLATION => CONFIG_MITIGATION_PAGE_TABLE_ISOLATION
CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINE
CONFIG_SLS => CONFIG_MITIGATION_SLS
CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRY
CONFIG_CPU_IBRS_ENTRY => CONFIG_MITIGATION_IBRS_ENTRY
CONFIG_CPU_SRSO => CONFIG_MITIGATION_SRSO
CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK

Implement step 1/10 of the namespace unification of CPU mitigations related
Kconfig options and rename CONFIG_GDS_FORCE_MITIGATION to
CONFIG_MITIGATION_GDS_FORCE.

[ mingo: Rewrote changelog for clarity. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-2-leitao@debian.org


# e29aad08 09-Oct-2023 Uros Bizjak <ubizjak@gmail.com>

x86/percpu: Disable named address spaces for KASAN

-fsanitize=kernel-address (KASAN) is at the moment incompatible
with named address spaces - see GCC PR sanitizer/111736:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736

GCC is doing a KASAN check on a percpu address which it shouldn't do,
and didn't used to do because we did the access using inline asm.

But now that GCC does the accesses as normal (albeit special address
space) memory accesses, the KASAN code triggers on them too, and it
all goes to hell in a handbasket very quickly.

Those percpu accessor functions need to disable any KASAN
checking or other sanitizer checking. Not on the percpu address,
because that's not a "real" address, it's obviously just the offset
from the segment register.

And GCC should probably not have generated such code in the first
place, so arguably this is a bug with -fsanitize=kernel-address.

The patch also removes a stale dependency on CONFIG_SMP.

Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20231009151409.53656-1-ubizjak@gmail.com

Closes: https://lore.kernel.org/oe-lkp/202310071301.a5113890-oliver.sang@intel.com


# 1ca3683c 04-Oct-2023 Uros Bizjak <ubizjak@gmail.com>

x86/percpu: Enable named address spaces with known compiler version

Enable named address spaces with known compiler versions
(GCC 12.1 and later) in order to avoid possible issues with named
address spaces with older compilers. Set CC_HAS_NAMED_AS when the
compiler satisfies version requirements and set USE_X86_SEG_SUPPORT
to signal when segment qualifiers could be used.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20231004145137.86537-3-ubizjak@gmail.com


# 83e1bdc9 13-Dec-2023 Kai Huang <kai.huang@intel.com>

x86/virt/tdx: Make TDX host depend on X86_MCE

A build failure was reported that when INTEL_TDX_HOST is enabled but
X86_MCE is not, the tdx_dump_mce_info() function fails to link:

ld: vmlinux.o: in function `tdx_dump_mce_info':
...: undefined reference to `mce_is_memory_error'
...: undefined reference to `mce_usable_address'

The reason is in such configuration, despite there's no caller of
tdx_dump_mce_info() it is still built and there's no implementation for
the two "mce_*()" functions.

Make INTEL_TDX_HOST depend on X86_MCE to fix.

It makes sense to enable MCE support for the TDX host anyway. Because
the only way that TDX has to report integrity errors is an MCE, and it
is not good to silently ignore such MCE. The TDX spec also suggests
the host VMM is expected to implement the MCE handler.

Note it also makes sense to make INTEL_TDX_HOST select X86_MCE but this
generates "recursive dependency detected!" error in the Kconfig.

Closes: https://lore.kernel.org/all/20231212214612.GHZXjUpBFa1IwVMTI7@fat_crate.local/T/
Fixes: 70060463cb2b ("x86/mce: Differentiate real hardware #MCs from TDX erratum ones")
Reported-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/all/20231212214612.GHZXjUpBFa1IwVMTI7@fat_crate.local/T/#m1a109c29324b2bbd0b3b1d45c218012cd3a13be6
Link: https://lore.kernel.org/all/20231213222825.286809-1-kai.huang%40intel.com


# cb8eb06d 08-Dec-2023 Dave Hansen <dave.hansen@linux.intel.com>

x86/virt/tdx: Disable TDX host support when kexec is enabled

TDX host support currently lacks the ability to handle kexec. Disable TDX
when kexec is enabled.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20231208170740.53979-20-dave.hansen%40intel.com


# 8f23f5db 26-Oct-2023 Jason Gunthorpe <jgg@ziepe.ca>

iommu: Change kconfig around IOMMU_SVA

Linus suggested that the kconfig here is confusing:

https://lore.kernel.org/all/CAHk-=wgUiAtiszwseM1p2fCJ+sC4XWQ+YN4TanFhUgvUqjr9Xw@mail.gmail.com/

Let's break it into three kconfigs controlling distinct things:

- CONFIG_IOMMU_MM_DATA controls if the mm_struct has the additional
fields for the IOMMU. Currently only PASID, but later patches store
a struct iommu_mm_data *

- CONFIG_ARCH_HAS_CPU_PASID controls if the arch needs the scheduling bit
for keeping track of the ENQCMD instruction. x86 will select this if
IOMMU_SVA is enabled

- IOMMU_SVA controls if the IOMMU core compiles in the SVA support code
for iommu driver use and the IOMMU exported API

This way ARM will not enable CONFIG_ARCH_HAS_CPU_PASID

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20231027000525.1278806-2-tina.zhang@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>


# ac3a2208 08-Dec-2023 Kai Huang <kai.huang@intel.com>

x86/virt/tdx: Allocate and set up PAMTs for TDMRs

The TDX module uses additional metadata to record things like which
guest "owns" a given page of memory. This metadata, referred as
Physical Address Metadata Table (PAMT), essentially serves as the
'struct page' for the TDX module. PAMTs are not reserved by hardware
up front. They must be allocated by the kernel and then given to the
TDX module during module initialization.

TDX supports 3 page sizes: 4K, 2M, and 1G. Each "TD Memory Region"
(TDMR) has 3 PAMTs to track the 3 supported page sizes. Each PAMT must
be a physically contiguous area from a Convertible Memory Region (CMR).
However, the PAMTs which track pages in one TDMR do not need to reside
within that TDMR but can be anywhere in CMRs. If one PAMT overlaps with
any TDMR, the overlapping part must be reported as a reserved area in
that particular TDMR.

Use alloc_contig_pages() since PAMT must be a physically contiguous area
and it may be potentially large (~1/256th of the size of the given TDMR).
The downside is alloc_contig_pages() may fail at runtime. One (bad)
mitigation is to launch a TDX guest early during system boot to get
those PAMTs allocated at early time, but the only way to fix is to add a
boot option to allocate or reserve PAMTs during kernel boot.

It is imperfect but will be improved on later.

TDX only supports a limited number of reserved areas per TDMR to cover
both PAMTs and memory holes within the given TDMR. If many PAMTs are
allocated within a single TDMR, the reserved areas may not be sufficient
to cover all of them.

Adopt the following policies when allocating PAMTs for a given TDMR:

- Allocate three PAMTs of the TDMR in one contiguous chunk to minimize
the total number of reserved areas consumed for PAMTs.
- Try to first allocate PAMT from the local node of the TDMR for better
NUMA locality.

Also dump out how many pages are allocated for PAMTs when the TDX module
is initialized successfully. This helps answer the eternal "where did
all my memory go?" questions.

[ dhansen: merge in error handling cleanup ]

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/all/20231208170740.53979-11-dave.hansen%40intel.com


# abe8dbab 08-Dec-2023 Kai Huang <kai.huang@intel.com>

x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory

Start to transit out the "multi-steps" to initialize the TDX module.

TDX provides increased levels of memory confidentiality and integrity.
This requires special hardware support for features like memory
encryption and storage of memory integrity checksums. Not all memory
satisfies these requirements.

As a result, TDX introduced the concept of a "Convertible Memory Region"
(CMR). During boot, the firmware builds a list of all of the memory
ranges which can provide the TDX security guarantees. The list of these
ranges is available to the kernel by querying the TDX module.

CMRs tell the kernel which memory is TDX compatible. The kernel needs
to build a list of memory regions (out of CMRs) as "TDX-usable" memory
and pass them to the TDX module. Once this is done, those "TDX-usable"
memory regions are fixed during module's lifetime.

To keep things simple, assume that all TDX-protected memory will come
from the page allocator. Make sure all pages in the page allocator
*are* TDX-usable memory.

As TDX-usable memory is a fixed configuration, take a snapshot of the
memory configuration from memblocks at the time of module initialization
(memblocks are modified on memory hotplug). This snapshot is used to
enable TDX support for *this* memory configuration only. Use a memory
hotplug notifier to ensure that no other RAM can be added outside of
this configuration.

This approach requires all memblock memory regions at the time of module
initialization to be TDX convertible memory to work, otherwise module
initialization will fail in a later SEAMCALL when passing those regions
to the module. This approach works when all boot-time "system RAM" is
TDX convertible memory and no non-TDX-convertible memory is hot-added
to the core-mm before module initialization.

For instance, on the first generation of TDX machines, both CXL memory
and NVDIMM are not TDX convertible memory. Using kmem driver to hot-add
any CXL memory or NVDIMM to the core-mm before module initialization
will result in failure to initialize the module. The SEAMCALL error
code will be available in the dmesg to help user to understand the
failure.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/all/20231208170740.53979-7-dave.hansen%40intel.com


# 3115cabd 08-Dec-2023 Kai Huang <kai.huang@intel.com>

x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC

TDX capable platforms are locked to X2APIC mode and cannot fall back to
the legacy xAPIC mode when TDX is enabled by the BIOS. TDX host support
requires x2APIC. Make INTEL_TDX_HOST depend on X86_X2APIC.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/lkml/ba80b303-31bf-d44a-b05d-5c0f83038798@intel.com/
Link: https://lore.kernel.org/all/20231208170740.53979-3-dave.hansen%40intel.com


# 5b95f94c 21-Nov-2023 James Morse <james.morse@arm.com>

x86/topology: Switch over to GENERIC_CPU_DEVICES

Now that GENERIC_CPU_DEVICES calls arch_register_cpu(), which can be
overridden by the arch code, switch over to this to allow common code
to choose when the register_cpu() call is made.

x86's struct cpus come from struct x86_cpu, which has no other members
or users. Remove this and use the version defined by common code.

This is an intermediate step to the logic being moved to drivers/acpi,
where GENERIC_CPU_DEVICES will do the work when booting with acpi=off.

This patch also has the effect of moving the registration of CPUs from
subsys to driver core initialisation, prior to any initcalls running.

----
Changes since RFC:
* Fixed the second copy of arch_register_cpu() used for non-hotplug
Changes since RFC v2:
* Remove duplicate of the weak generic arch_register_cpu(), spotted
by Jonathan Cameron. Add note about initialisation order change.
Changes since RFC v3:
* Adapt to removal of EXPORT_SYMBOL()s

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/E1r5R3l-00Cszm-UA@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# a02f66bb 21-Nov-2023 James Morse <james.morse@arm.com>

ACPI: Move ACPI_HOTPLUG_CPU to be disabled on arm64 and riscv

Neither arm64 nor riscv support physical hotadd of CPUs that were not
present at boot. For arm64 much of the platform description is in static
tables which do not have update methods. arm64 does support HOTPLUG_CPU,
which is backed by a firmware interface to turn CPUs on and off.

acpi_processor_hotadd_init() and acpi_processor_remove() are for adding
and removing CPUs that were not present at boot. arm64 systems that do this
are not supported as there is currently insufficient information in the
platform description. (e.g. did the GICR get removed too?)

arm64 currently relies on the MADT enabled flag check in map_gicc_mpidr()
to prevent CPUs that were not described as present at boot from being
added to the system. Similarly, riscv relies on the same check in
map_rintc_hartid(). Both architectures also rely on the weak 'always fails'
definitions of acpi_map_cpu() and arch_register_cpu().

Subsequent changes will redefine ACPI_HOTPLUG_CPU as making possible
CPUs present. Neither arm64 nor riscv support this.

Disable ACPI_HOTPLUG_CPU for arm64 and riscv by removing 'default y' and
selecting it on the other three ACPI architectures. This allows the weak
definitions of some symbols to be removed.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/E1r5R31-00Csyt-Jq@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 71ce1ab5 27-Dec-2023 Kinsey Ho <kinseyho@google.com>

mm/mglru: add CONFIG_ARCH_HAS_HW_PTE_YOUNG

Patch series "mm/mglru: Kconfig cleanup", v4.

This series is the result of the following discussion:
https://lore.kernel.org/47066176-bd93-55dd-c2fa-002299d9e034@linux.ibm.com/

It mainly avoids building the code that walks page tables on CPUs that
use it, i.e., those don't support hardware accessed bit. Specifically,
it introduces a new Kconfig to guard some of functions added by
commit bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
on CPUs like POWER9, on which the series was tested.


This patch (of 5):

Some architectures are able to set the accessed bit in PTEs when PTEs
are used as part of linear address translations.

Add CONFIG_ARCH_HAS_HW_PTE_YOUNG for such architectures to be able to
override arch_has_hw_pte_young().

Link: https://lkml.kernel.org/r/20231227141205.2200125-1-kinseyho@google.com
Link: https://lkml.kernel.org/r/20231227141205.2200125-2-kinseyho@google.com
Signed-off-by: Kinsey Ho <kinseyho@google.com>
Co-developed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Donet Tom <donettom@linux.vnet.ibm.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 2a19be61 02-Oct-2023 Vlastimil Babka <vbabka@suse.cz>

mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile

Remove CONFIG_SLAB, CONFIG_DEBUG_SLAB, CONFIG_SLAB_DEPRECATED and
everything in Kconfig files and mm/Makefile that depends on those. Since
SLUB is the only remaining allocator, remove the allocator choice, make
CONFIG_SLUB a "def_bool y" for now and remove all explicit dependencies
on SLUB or SLAB as it's now always enabled. Make every option's verbose
name and description refer to "the slab allocator" without refering to
the specific implementation. Do not rename the CONFIG_ option names yet.

Everything under #ifdef CONFIG_SLAB, and mm/slab.c is now dead code, all
code under #ifdef CONFIG_SLUB is now always compiled.

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>


# 88a2b4ed 04-Dec-2023 Arnd Bergmann <arnd@arndb.de>

x86/Kconfig: Rework CONFIG_X86_PAE dependency

While looking at a Xen Kconfig dependency issue, I tried to understand the
exact dependencies for CONFIG_X86_PAE, which is selected by CONFIG_HIGHMEM64G
but can also be enabled manually.

Apparently the dependencies for CONFIG_HIGHMEM64G are strictly about CPUs
that do support PAE, but the actual feature can be incorrectly enabled on
older CPUs as well. The CONFIG_X86_CMPXCHG64 dependencies on the other hand
include X86_PAE because cmpxchg8b is requried for PAE to work.

Rework this for readability and correctness, using a positive list of CPUs
that support PAE in a new X86_HAVE_PAE symbol that can serve as a dependency
for both X86_PAE and HIGHMEM64G as well as simplify the X86_CMPXCHG64
dependency list.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231204084722.3789473-2-arnd@kernel.org


# c6454559 28-Nov-2023 Lukas Bulwahn <lukas.bulwahn@gmail.com>

x86/Kconfig: Remove obsolete config X86_32_SMP

Commit

0f08c3b22996 ("x86/smp: Reduce code duplication")

removed the only use of CONFIG_X86_32_SMP.

Remove the now obsolete config X86_32_SMP too.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231128090016.29676-1-lukas.bulwahn@gmail.com


# c1ad12ee 23-Oct-2023 Arnd Bergmann <arnd@arndb.de>

kexec: fix KEXEC_FILE dependencies

The cleanup for the CONFIG_KEXEC Kconfig logic accidentally changed the
'depends on CRYPTO=y' dependency to a plain 'depends on CRYPTO', which
causes a link failure when all the crypto support is in a loadable module
and kexec_file support is built-in:

x86_64-linux-ld: vmlinux.o: in function `__x64_sys_kexec_file_load':
(.text+0x32e30a): undefined reference to `crypto_alloc_shash'
x86_64-linux-ld: (.text+0x32e58e): undefined reference to `crypto_shash_update'
x86_64-linux-ld: (.text+0x32e6ee): undefined reference to `crypto_shash_final'

Both s390 and x86 have this problem, while ppc64 and riscv have the
correct dependency already. On riscv, the dependency is only used for the
purgatory, not for the kexec_file code itself, which may be a bit
surprising as it means that with CONFIG_CRYPTO=m, it is possible to enable
KEXEC_FILE but then the purgatory code is silently left out.

Move this into the common Kconfig.kexec file in a way that is correct
everywhere, using the dependency on CRYPTO_SHA256=y only when the
purgatory code is available. This requires reversing the dependency
between ARCH_SUPPORTS_KEXEC_PURGATORY and KEXEC_FILE, but the effect
remains the same, other than making riscv behave like the other ones.

On s390, there is an additional dependency on CRYPTO_SHA256_S390, which
should technically not be required but gives better performance. Remove
this dependency here, noting that it was not present in the initial
Kconfig code but was brought in without an explanation in commit
71406883fd357 ("s390/kexec_file: Add kexec_file_load system call").

[arnd@arndb.de: fix riscv build]
Link: https://lkml.kernel.org/r/67ddd260-d424-4229-a815-e3fcfb864a77@app.fastmail.com
Link: https://lkml.kernel.org/r/20231023110308.1202042-1-arnd@kernel.org
Fixes: 6af5138083005 ("x86/kexec: refactor for kernel/Kconfig.kexec")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Eric DeVolder <eric_devolder@yahoo.com>
Tested-by: Eric DeVolder <eric_devolder@yahoo.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Conor Dooley <conor@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 9407bda8 17-Oct-2023 Thomas Gleixner <tglx@linutronix.de>

x86/microcode: Prepare for minimal revision check

Applying microcode late can be fatal for the running kernel when the
update changes functionality which is in use already in a non-compatible
way, e.g. by removing a CPUID bit.

There is no way for admins which do not have access to the vendors deep
technical support to decide whether late loading of such a microcode is
safe or not.

Intel has added a new field to the microcode header which tells the
minimal microcode revision which is required to be active in the CPU in
order to be safe.

Provide infrastructure for handling this in the core code and a command
line switch which allows to enforce it.

If the update is considered safe the kernel is not tainted and the annoying
warning message not emitted. If it's enforced and the currently loaded
microcode revision is not safe for late loading then the load is aborted.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231017211724.079611170@linutronix.de


# 634ac23a 02-Oct-2023 Thomas Gleixner <tglx@linutronix.de>

x86/microcode: Handle "nosmt" correctly

On CPUs where microcode loading is not NMI-safe the SMT siblings which
are parked in one of the play_dead() variants still react to NMIs.

So if an NMI hits while the primary thread updates the microcode the
resulting behaviour is undefined. The default play_dead() implementation on
modern CPUs is using MWAIT which is not guaranteed to be safe against
a microcode update which affects MWAIT.

Take the cpus_booted_once_mask into account to detect this case and
refuse to load late if the vendor specific driver does not advertise
that late loading is NMI safe.

AMD stated that this is safe, so mark the AMD driver accordingly.

This requirement will be partially lifted in later changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115903.087472735@linutronix.de


# fdbd4381 17-Oct-2023 Thomas Gleixner <tglx@linutronix.de>

x86/microcode: Provide CONFIG_MICROCODE_INITRD32

Create an aggregate config switch which covers X86_32, MICROCODE and
BLK_DEV_INITRD to avoid lengthy #ifdeffery in upcoming code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231017211722.236208250@linutronix.de


# 9c08a2a1 13-Sep-2023 Baoquan He <bhe@redhat.com>

x86: kdump: use generic interface to simplify crashkernel reservation code

With the help of newly changed function parse_crashkernel() and generic
reserve_crashkernel_generic(), crashkernel reservation can be simplified
by steps:

1) Add a new header file <asm/crash_core.h>, and define CRASH_ALIGN,
CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and
DEFAULT_CRASH_KERNEL_LOW_SIZE in <asm/crash_core.h>;

2) Add arch_reserve_crashkernel() to call parse_crashkernel() and
reserve_crashkernel_generic(), and do the ARCH specific work if
needed.

3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in
arch/x86/Kconfig.

When adding DEFAULT_CRASH_KERNEL_LOW_SIZE, add crash_low_size_default() to
calculate crashkernel low memory because x86_64 has special requirement.

The old reserve_crashkernel_low() and reserve_crashkernel() can be
removed.

[bhe@redhat.com: move crash_low_size_default() code into <asm/crash_core.h>]
Link: https://lkml.kernel.org/r/ZQpeAjOmuMJBFw1/@MiWiFi-R3L-srv
Link: https://lkml.kernel.org/r/20230914033142.676708-7-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# c33621b4 15-Aug-2023 Kai Huang <kai.huang@intel.com>

x86/virt/tdx: Wire up basic SEAMCALL functions

Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
host and certain physical attacks. A CPU-attested software module
called 'the TDX module' runs inside a new isolated memory range as a
trusted hypervisor to manage and run protected VMs.

TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM). This
mode runs only the TDX module itself or other code to load the TDX
module.

The host kernel communicates with SEAM software via a new SEAMCALL
instruction. This is conceptually similar to a guest->host hypercall,
except it is made from the host to SEAM software instead. The TDX
module establishes a new SEAMCALL ABI which allows the host to
initialize the module and to manage VMs.

The SEAMCALL ABI is very similar to the TDCALL ABI and leverages much
TDCALL infrastructure. Wire up basic functions to make SEAMCALLs for
the basic support of running TDX guests: __seamcall(), __seamcall_ret(),
and __seamcall_saved_ret() for TDH.VP.ENTER. All SEAMCALLs involved in
the basic TDX support don't use "callee-saved" registers as input and
output, except the TDH.VP.ENTER.

To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for
TDX host kernel support. Add a new Kconfig option CONFIG_INTEL_TDX_HOST
to opt-in TDX host kernel support (to distinguish with TDX guest kernel
support). So far only KVM uses TDX. Make the new config option depend
on KVM_INTEL.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Isaku Yamahata <isaku.yamahata@intel.com>
Link: https://lore.kernel.org/all/4db7c3fc085e6af12acc2932294254ddb3d320b3.1692096753.git.kai.huang%40intel.com


# 0c436a58 25-Aug-2023 Saurabh Sengar <ssengar@linux.microsoft.com>

x86/numa: Add Devicetree support

Hyper-V has usecases where it needs to fetch NUMA information from
Devicetree. Currently, it is not possible to extract the NUMA information
from Devicetree for the x86 arch.

Add support for Devicetree in the x86_numa_init() function, allowing the
retrieval of NUMA node information from the Devicetree.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/1692949657-16446-2-git-send-email-ssengar@linux.microsoft.com


# a432b7c0 18-Sep-2023 Uros Bizjak <ubizjak@gmail.com>

locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_CMPXCHG64

The following commit:

bc08b449ee14 ("lockref: implement lockless reference count updates using cmpxchg()")

enabled lockless reference count updates using cmpxchg() only for x86_64,
and left x86_32 behind due to inability to detect support for
cmpxchg8b instruction.

Nowadays, we can use CONFIG_X86_CMPXCHG64 for this purpose. Also,
by using try_cmpxchg64() instead of cmpxchg64() in the CMPXCHG_LOOP macro,
the compiler actually produces sane code, improving the
lockref_get_not_zero() main loop from:

eb: 8d 48 01 lea 0x1(%eax),%ecx
ee: 85 c0 test %eax,%eax
f0: 7e 2f jle 121 <lockref_get_not_zero+0x71>
f2: 8b 44 24 10 mov 0x10(%esp),%eax
f6: 8b 54 24 14 mov 0x14(%esp),%edx
fa: 8b 74 24 08 mov 0x8(%esp),%esi
fe: f0 0f c7 0e lock cmpxchg8b (%esi)
102: 8b 7c 24 14 mov 0x14(%esp),%edi
106: 89 c1 mov %eax,%ecx
108: 89 c3 mov %eax,%ebx
10a: 8b 74 24 10 mov 0x10(%esp),%esi
10e: 89 d0 mov %edx,%eax
110: 31 fa xor %edi,%edx
112: 31 ce xor %ecx,%esi
114: 09 f2 or %esi,%edx
116: 75 58 jne 170 <lockref_get_not_zero+0xc0>

to:

350: 8d 4f 01 lea 0x1(%edi),%ecx
353: 85 ff test %edi,%edi
355: 7e 79 jle 3d0 <lockref_get_not_zero+0xb0>
357: f0 0f c7 0e lock cmpxchg8b (%esi)
35b: 75 53 jne 3b0 <lockref_get_not_zero+0x90>

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20230918184050.9180-1-ubizjak@gmail.com


# a11e0975 23-Jun-2023 Nikolay Borisov <nik.borisov@suse.com>

x86: Make IA32_EMULATION boot time configurable

Distributions would like to reduce their attack surface as much as
possible but at the same time they'd want to retain flexibility to cater
to a variety of legacy software. This stems from the conjecture that
compat layer is likely rarely tested and could have latent security
bugs. Ideally distributions will set their default policy and also
give users the ability to override it as appropriate.

To enable this use case, introduce CONFIG_IA32_EMULATION_DEFAULT_DISABLED
compile time option, which controls whether 32bit processes/syscalls
should be allowed or not. This option is aimed mainly at distributions
to set their preferred default behavior in their kernels.

To allow users to override the distro's policy, introduce the 'ia32_emulation'
parameter which allows overriding CONFIG_IA32_EMULATION_DEFAULT_DISABLED
state at boot time.

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230623111409.3047467-7-nik.borisov@suse.com


# aba7e066 02-Aug-2023 Ard Biesheuvel <ardb@kernel.org>

efi/x86: Ensure that EFI_RUNTIME_MAP is enabled for kexec

CONFIG_EFI_RUNTIME_MAP needs to be enabled in order for kexec to be able
to provide the required information about the EFI runtime mappings to
the incoming kernel, regardless of whether kexec_load() or
kexec_file_load() is being used. Without this information, kexec boot in
EFI mode is not possible.

The CONFIG_EFI_RUNTIME_MAP option is currently directly configurable if
CONFIG_EXPERT is enabled, so that it can be turned on for debugging
purposes even if KEXEC is not enabled. However, the upshot of this is
that it can also be disabled even when it shouldn't.

So tweak the Kconfig declarations to avoid this situation.

Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# 54acee60 02-Aug-2023 Dave Hansen <dave.hansen@linux.intel.com>

x86/kbuild: Fix Documentation/ reference

x86 documentation moved. Fix the reference.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202307271956.IxoG9X0c-lkp@intel.com/


# 18e66b69 12-Jun-2023 Rick Edgecombe <rick.p.edgecombe@intel.com>

x86/shstk: Add Kconfig option for shadow stack

Shadow stack provides protection for applications against function return
address corruption. It is active when the processor supports it, the
kernel has CONFIG_X86_SHADOW_STACK enabled, and the application is built
for the feature. This is only implemented for the 64-bit kernel. When it
is enabled, legacy non-shadow stack applications continue to work, but
without protection.

Since there is another feature that utilizes CET (Kernel IBT) that will
share implementation with shadow stacks, create CONFIG_CET to signify
that at least one CET feature is configured.

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230613001108.3040476-7-rick.p.edgecombe%40intel.com


# ea53ad9c 14-Aug-2023 Eric DeVolder <eric.devolder@oracle.com>

x86/crash: add x86 crash hotplug support

When CPU or memory is hot un/plugged, or off/onlined, the crash
elfcorehdr, which describes the CPUs and memory in the system, must also
be updated.

A new elfcorehdr is generated from the available CPUs and memory and
replaces the existing elfcorehdr. The segment containing the elfcorehdr
is identified at run-time in crash_core:crash_handle_hotplug_event().

No modifications to purgatory (see 'kexec: exclude elfcorehdr from the
segment digest') or boot_params (as the elfcorehdr= capture kernel command
line parameter pointer remains unchanged and correct) are needed, just
elfcorehdr.

For kexec_file_load(), the elfcorehdr segment size is based on NR_CPUS and
CRASH_MAX_MEMORY_RANGES in order to accommodate a growing number of CPU
and memory resources.

For kexec_load(), the userspace kexec utility needs to size the elfcorehdr
segment in the same/similar manner.

To accommodate kexec_load() syscall in the absence of kexec_file_load()
syscall support, prepare_elf_headers() and dependents are moved outside of
CONFIG_KEXEC_FILE.

[eric.devolder@oracle.com: correct unused function build error]
Link: https://lkml.kernel.org/r/20230821182644.2143-1-eric.devolder@oracle.com
Link: https://lkml.kernel.org/r/20230814214446.6659-6-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Akhil Raj <lf32.dev@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 04d5ea46 08-Aug-2023 Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

mm/memory_hotplug: simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig

Patch series "Add support for memmap on memory feature on ppc64", v8.

This patch series update memmap on memory feature to fall back to
memmap allocation outside the memory block if the alignment rules are
not met. This makes the feature more useful on architectures like
ppc64 where alignment rules are different with 64K page size.


This patch (of 6):

Instead of adding menu entry with all supported architectures, add
mm/Kconfig variable and select the same from supported architectures.

No functional change in this patch.

Link: https://lkml.kernel.org/r/20230808091501.287660-1-aneesh.kumar@linux.ibm.com
Link: https://lkml.kernel.org/r/20230808091501.287660-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# e6265fe7 11-Jul-2023 Eric DeVolder <eric.devolder@oracle.com>

kexec: rename ARCH_HAS_KEXEC_PURGATORY

The Kconfig refactor to consolidate KEXEC and CRASH options utilized
option names of the form ARCH_SUPPORTS_<option>. Thus rename the
ARCH_HAS_KEXEC_PURGATORY to ARCH_SUPPORTS_KEXEC_PURGATORY to follow
the same.

Link: https://lkml.kernel.org/r/20230712161545.87870-15-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 6af51380 11-Jul-2023 Eric DeVolder <eric.devolder@oracle.com>

x86/kexec: refactor for kernel/Kconfig.kexec

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Link: https://lkml.kernel.org/r/20230712161545.87870-3-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 0b6f1582 24-Jul-2023 Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

mm/vmemmap optimization: split hugetlb and devdax vmemmap optimization

Arm disabled hugetlb vmemmap optimization [1] because hugetlb vmemmap
optimization includes an update of both the permissions (writeable to
read-only) and the output address (pfn) of the vmemmap ptes. That is not
supported without unmapping of pte(marking it invalid) by some
architectures.

With DAX vmemmap optimization we don't require such pte updates and
architectures can enable DAX vmemmap optimization while having hugetlb
vmemmap optimization disabled. Hence split DAX optimization support into
a different config.

s390, loongarch and riscv don't have devdax support. So the DAX config is
not enabled for them. With this change, arm64 should be able to select
DAX optimization

[1] commit 060a2c92d1b6 ("arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP")

Link: https://lkml.kernel.org/r/20230724190759.483013-8-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# e6bcfdd7 10-Aug-2023 Thomas Gleixner <tglx@linutronix.de>

x86/microcode: Hide the config knob

In reality CONFIG_MICROCODE is enabled in any reasonable configuration when
Intel or AMD support is enabled. Accommodate to reality.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230812195727.660453052@linutronix.de


# fb3bd914 28-Jun-2023 Borislav Petkov (AMD) <bp@alien8.de>

x86/srso: Add a Speculative RAS Overflow mitigation

Add a mitigation for the speculative return address stack overflow
vulnerability found on AMD processors.

The mitigation works by ensuring all RET instructions speculate to
a controlled location, similar to how speculation is controlled in the
retpoline sequence. To accomplish this, the __x86_return_thunk forces
the CPU to mispredict every function return using a 'safe return'
sequence.

To ensure the safety of this mitigation, the kernel must ensure that the
safe return sequence is itself free from attacker interference. In Zen3
and Zen4, this is accomplished by creating a BTB alias between the
untraining function srso_untrain_ret_alias() and the safe return
function srso_safe_ret_alias() which results in evicting a potentially
poisoned BTB entry and using that safe one for all function returns.

In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>


# 53cf5797 12-Jul-2023 Daniel Sneddon <daniel.sneddon@linux.intel.com>

x86/speculation: Add Kconfig option for GDS

Gather Data Sampling (GDS) is mitigated in microcode. However, on
systems that haven't received the updated microcode, disabling AVX
can act as a mitigation. Add a Kconfig option that uses the microcode
mitigation if available and disables AVX otherwise. Setting this
option has no effect on systems not affected by GDS. This is the
equivalent of setting gather_data_sampling=force.

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>


# d938ba17 08-Apr-2023 Donglin Peng <pengdonglin@sangfor.com.cn>

x86/ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL

The previous patch ("function_graph: Support recording and printing
the return value of function") has laid the groundwork for the for
the funcgraph-retval, and this modification makes it available on
the x86 platform.

We introduce a new structure called fgraph_ret_regs for the x86
platform to hold return registers and the frame pointer. We then
fill its content in the return_to_handler and pass its address
to the function ftrace_return_to_handler to record the return
value.

Link: https://lkml.kernel.org/r/53a506f0f18ff4b7aeb0feb762f1c9a5e9b83ee9.1680954589.git.pengdonglin@sangfor.com.cn

Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# c2508ec5 15-Jun-2023 Linus Torvalds <torvalds@linux-foundation.org>

mm: introduce new 'lock_mm_and_find_vma()' page fault helper

.. and make x86 use it.

This basically extracts the existing x86 "find and expand faulting vma"
code, but extends it to also take the mmap lock for writing in case we
actually do need to expand the vma.

We've historically short-circuited that case, and have some rather ugly
special logic to serialize the stack segment expansion (since we only
hold the mmap lock for reading) that doesn't match the normal VM
locking.

That slight violation of locking worked well, right up until it didn't:
the maple tree code really does want proper locking even for simple
extension of an existing vma.

So extract the code for "look up the vma of the fault" from x86, fix it
up to do the necessary write locking, and make it available as a helper
function for other architectures that can use the common helper.

Note: I say "common helper", but it really only handles the normal
stack-grows-down case. Which is all architectures except for PA-RISC
and IA64. So some rare architectures can't use the helper, but if they
care they'll just need to open-code this logic.

It's also worth pointing out that this code really would like to have an
optimistic "mmap_upgrade_trylock()" to make it quicker to go from a
read-lock (for the common case) to taking the write lock (for having to
extend the vma) in the normal single-threaded situation where there is
no other locking activity.

But that _is_ all the very uncommon special case, so while it would be
nice to have such an operation, it probably doesn't matter in reality.
I did put in the skeleton code for such a possible future expansion,
even if it only acts as pseudo-documentation for what we're doing.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7c7077a7 13-Jun-2023 Thomas Gleixner <tglx@linutronix.de>

x86/cpu: Switch to arch_cpu_finalize_init()

check_bugs() is a dumping ground for finalizing the CPU bringup. Only parts of
it has to do with actual CPU bugs.

Split it apart into arch_cpu_finalize_init() and cpu_select_mitigations().

Fixup the bogus 32bit comments while at it.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230613224545.019583869@linutronix.de


# 6c321179 06-Jun-2023 Tom Lendacky <thomas.lendacky@amd.com>

x86/sev: Add SNP-specific unaccepted memory support

Add SNP-specific hooks to the unaccepted memory support in the boot
path (__accept_memory()) and the core kernel (accept_memory()) in order
to support booting SNP guests when unaccepted memory is present. Without
this support, SNP guests will fail to boot and/or panic() when unaccepted
memory is present in the EFI memory map.

The process of accepting memory under SNP involves invoking the hypervisor
to perform a page state change for the page to private memory and then
issuing a PVALIDATE instruction to accept the page.

Since the boot path and the core kernel paths perform similar operations,
move the pvalidate_pages() and vmgexit_psc() functions into sev-shared.c
to avoid code duplication.

Create the new header file arch/x86/boot/compressed/sev.h because adding
the function declaration to any of the existing SEV related header files
pulls in too many other header files, causing the build to fail.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/a52fa69f460fd1876d70074b20ad68210dfc31dd.1686063086.git.thomas.lendacky@amd.com


# 75d090fd 06-Jun-2023 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/tdx: Add unaccepted memory support

Hookup TDX-specific code to accept memory.

Accepting the memory is done with ACCEPT_PAGE module call on every page
in the range. MAP_GPA hypercall is not required as the unaccepted memory
is considered private already.

Extract the part of tdx_enc_status_changed() that does memory acceptance
in a new helper. Move the helper tdx-shared.c. It is going to be used by
both main kernel and decompressor.

[ bp: Fix the INTEL_TDX_GUEST=y, KVM_GUEST=n build. ]

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230606142637.5171-10-kirill.shutemov@linux.intel.com


# 7583e8fb 10-May-2023 Lukas Bulwahn <lukas.bulwahn@gmail.com>

x86/cpu: Remove X86_FEATURE_NAMES

While discussing to change the visibility of X86_FEATURE_NAMES (see Link)
in order to remove CONFIG_EMBEDDED, Boris suggested to simply make the
X86_FEATURE_NAMES functionality unconditional.

As the need for really tiny kernel images has gone away and kernel images
with !X86_FEATURE_NAMES are hardly tested, remove this config and the whole
ifdeffery in the source code.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/all/20230509084007.24373-1-lukas.bulwahn@gmail.com/
Link: https://lore.kernel.org/r/20230510065713.10996-3-lukas.bulwahn@gmail.com


# 424e23fd 10-May-2023 Lukas Bulwahn <lukas.bulwahn@gmail.com>

x86/Kconfig: Make X86_FEATURE_NAMES non-configurable in prompt

While discussing to change the visibility of X86_FEATURE_NAMES (see Link)
in order to remove CONFIG_EMBEDDED, Boris suggested to simply make the
X86_FEATURE_NAMES functionality unconditional.

As a first step, make X86_FEATURE_NAMES disappear from Kconfig. So, as
X86_FEATURE_NAMES defaults to yes, to disable it, one now needs to
modify the .config file before compiling the kernel.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/all/20230509084007.24373-1-lukas.bulwahn@gmail.com/


# 0c7ffa32 12-May-2023 Thomas Gleixner <tglx@linutronix.de>

x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it

Implement the validation function which tells the core code whether
parallel bringup is possible.

The only condition for now is that the kernel does not run in an encrypted
guest as these will trap the RDMSR via #VC, which cannot be handled at that
point in early startup.

There was an earlier variant for AMD-SEV which used the GHBC protocol for
retrieving the APIC ID via CPUID, but there is no guarantee that the
initial APIC ID in CPUID is the same as the real APIC ID. There is no
enforcement from the secure firmware and the hypervisor can assign APIC IDs
as it sees fit as long as the ACPI/MADT table is consistent with that
assignment.

Unfortunately there is no RDMSR GHCB protocol at the moment, so enabling
AMD-SEV guests for parallel startup needs some more thought.

Intel-TDX provides a secure RDMSR hypercall, but supporting that is outside
the scope of this change.

Fixup announce_cpu() as e.g. on Hyper-V CPU1 is the secondary sibling of
CPU0, which makes the @cpu == 1 logic in announce_cpu() fall apart.

[ mikelley: Reported the announce_cpu() fallout

Originally-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205257.467571745@linutronix.de


# 8b5a0f95 12-May-2023 Thomas Gleixner <tglx@linutronix.de>

x86/smpboot: Enable split CPU startup

The x86 CPU bringup state currently does AP wake-up, wait for AP to
respond and then release it for full bringup.

It is safe to be split into a wake-up and and a separate wait+release
state.

Provide the required functions and enable the split CPU bringup, which
prepares for parallel bringup, where the bringup of the non-boot CPUs takes
two iterations: One to prepare and wake all APs and the second to wait and
release them. Depending on timing this can eliminate the wait time
completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205257.133453992@linutronix.de


# 2711b8e2 12-May-2023 Thomas Gleixner <tglx@linutronix.de>

x86/smpboot: Switch to hotplug core state synchronization

The new AP state tracking and synchronization mechanism in the CPU hotplug
core code allows to remove quite some x86 specific code:

1) The AP alive synchronization based on cpumasks

2) The decision whether an AP can be brought up again

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205256.529657366@linutronix.de


# e59e74dc 12-May-2023 Thomas Gleixner <tglx@linutronix.de>

x86/topology: Remove CPU0 hotplug option

This was introduced together with commit e1c467e69040 ("x86, hotplug: Wake
up CPU0 via NMI instead of INIT, SIPI, SIPI") to eventually support
physical hotplug of CPU0:

"We'll change this code in the future to wake up hard offlined CPU0 if
real platform and request are available."

11 years later this has not happened and physical hotplug is not officially
supported. Remove the cruft.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.715707999@linutronix.de


# 0b376f1e 11-Apr-2023 Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

mm/hugetlb_vmemmap: rename ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP

Now we use ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP config option to
indicate devdax and hugetlb vmemmap optimization support. Hence rename
that to a generic ARCH_WANT_OPTIMIZE_VMEMMAP

Link: https://lkml.kernel.org/r/20230412050025.84346-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Tarun Sahu <tsahu@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 0bff0aae 27-Feb-2023 Suren Baghdasaryan <surenb@google.com>

x86/mm: try VMA lock-based page fault handling first

Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.

Link: https://lkml.kernel.org/r/20230227173632.3292573-30-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 6449dcb0 12-Mar-2023 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86: CPUID and CR3/CR4 flags for Linear Address Masking

Enumerate Linear Address Masking and provide defines for CR3 and CR4
flags.

The new CONFIG_ADDRESS_MASKING option enables the feature support in
kernel.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/all/20230312112612.31869-4-kirill.shutemov%40linux.intel.com


# fcbfe812 23-Mar-2023 Niklas Schnelle <schnelle@linux.ibm.com>

Kconfig: introduce HAS_IOPORT option and select it as necessary

We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O
Port access. In a future patch HAS_IOPORT=n will disable compilation of
the I/O accessor functions inb()/outb() and friends on architectures
which can not meaningfully support legacy I/O spaces such as s390.

The following architectures do not select HAS_IOPORT:

* ARC
* C-SKY
* Hexagon
* Nios II
* OpenRISC
* s390
* User-Mode Linux
* Xtensa

All other architectures select HAS_IOPORT at least conditionally.

The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs
for HAS_IOPORT specific sections will be added in subsequent patches on
a per subsystem basis.

Co-developed-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net> # for ARCH=um
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# d276134e 24-Mar-2023 Paul E. McKenney <paulmck@kernel.org>

arch/x86: Remove "select SRCU"

Now that the SRCU Kconfig option is unconditionally selected, there is
no longer any point in selecting it. Therefore, remove the "select SRCU"
Kconfig statements.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>


# ff61f079 14-Mar-2023 Jonathan Corbet <corbet@lwn.net>

docs: move x86 documentation into Documentation/arch/

Move the x86 documentation under Documentation/arch/ as a way of cleaning
up the top-level directory and making the structure of our docs more
closely match the structure of the source directories it describes.

All in-kernel references to the old paths have been updated.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20230315211523.108836-1-corbet@lwn.net/
Signed-off-by: Jonathan Corbet <corbet@lwn.net>


# 54628de6 24-Jan-2023 Randy Dunlap <rdunlap@infradead.org>

x86/Kconfig: Fix spellos & punctuation

Fix spelling (reported by codespell) & punctuation in arch/x86/ Kconfig.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230124181753.19309-1-rdunlap@infradead.org


# 6ca297d4 21-Oct-2022 Peter Zijlstra <peterz@infradead.org>

mm: Rename GUP_GET_PTE_LOW_HIGH

Since it no longer applies to only PTEs, rename it to PXX.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221022114424.776404066%40infradead.org


# 280981d6 14-Nov-2022 Sathvika Vasireddy <sv@linux.ibm.com>

objtool: Add --mnop as an option to --mcount

Some architectures (powerpc) may not support ftrace locations being nop'ed
out at build time. Introduce CONFIG_HAVE_OBJTOOL_NOP_MCOUNT for objtool, as
a means for architectures to enable nop'ing of ftrace locations. Add --mnop
as an option to objtool --mcount, to indicate support for the same.

Also, make sure that --mnop can be passed as an option to objtool only when
--mcount is passed.

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-12-sv@linux.ibm.com


# 4fd5f70c 01-Nov-2022 Kees Cook <keescook@chromium.org>

x86/Kconfig: Enable kernel IBT by default

The kernel IBT defense strongly mitigates the common "first step" of ROP
attacks, by eliminating arbitrary stack pivots (that appear either at
the end of a function or in immediate values), which cannot be reached
if indirect calls must be to marked function entry addresses. IBT is
also required to be enabled to gain the FineIBT feature when built with
Kernel Control Flow Integrity.

Additionally, given that this feature is runtime enabled via CPU ID,
it clearly should be built in by default; it will only be enabled if the
CPU supports it. The build takes 2 seconds longer, which seems a small
price to pay for gaining this coverage by default.

Suggested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221101172503.gonna.094-kees@kernel.org


# 931ab636 27-Oct-2022 Peter Zijlstra <peterz@infradead.org>

x86/ibt: Implement FineIBT

Implement an alternative CFI scheme that merges both the fine-grained
nature of kCFI but also takes full advantage of the coarse grained
hardware CFI as provided by IBT.

To contrast:

kCFI is a pure software CFI scheme and relies on being able to read
text -- specifically the instruction *before* the target symbol, and
does the hash validation *before* doing the call (otherwise control
flow is compromised already).

FineIBT is a software and hardware hybrid scheme; by ensuring every
branch target starts with a hash validation it is possible to place
the hash validation after the branch. This has several advantages:

o the (hash) load is avoided; no memop; no RX requirement.

o IBT WAIT-FOR-ENDBR state is a speculation stop; by placing
the hash validation in the immediate instruction after
the branch target there is a minimal speculation window
and the whole is a viable defence against SpectreBHB.

o Kees feels obliged to mention it is slightly more vulnerable
when the attacker can write code.

Obviously this patch relies on kCFI, but additionally it also relies
on the padding from the call-depth-tracking patches. It uses this
padding to place the hash-validation while the call-sites are
re-written to modify the indirect target to be 16 bytes in front of
the original target, thus hitting this new preamble.

Notably, there is no hardware that needs call-depth-tracking (Skylake)
and supports IBT (Tigerlake and onwards).

Suggested-by: Joao Moreira (Intel) <joao@overdrivepizza.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221027092842.634714496@infradead.org


# b341b20d 28-Oct-2022 Peter Zijlstra <peterz@infradead.org>

x86: Add prefix symbols for function padding

When code is compiled with:

-fpatchable-function-entry=${PADDING_BYTES},${PADDING_BYTES}

functions will have PADDING_BYTES of NOP in front of them. Unwinders
and other things that symbolize code locations will typically
attribute these bytes to the preceding function.

Given that these bytes nominally belong to the following symbol this
mis-attribution is confusing.

Inspired by the fact that CFI_CLANG emits __cfi_##name symbols to
claim these bytes, use objtool to emit __pfx_##name symbols to do
the same when CFI_CLANG is not used.

This then shows the callthunk for symbol 'name' as:

__pfx_##name+0x6/0x10

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Link: https://lkml.kernel.org/r/20221028194453.592512209@infradead.org


# e81dc127 15-Sep-2022 Thomas Gleixner <tglx@linutronix.de>

x86/callthunks: Add call patching for call depth tracking

Mitigating the Intel SKL RSB underflow issue in software requires to
track the call depth. That is every CALL and every RET need to be
intercepted and additional code injected.

The existing retbleed mitigations already include means of redirecting
RET to __x86_return_thunk; this can be re-purposed and RET can be
redirected to another function doing RET accounting.

CALL accounting will use the function padding introduced in prior
patches. For each CALL instruction, the destination symbol's padding
is rewritten to do the accounting and the CALL instruction is adjusted
to call into the padding.

This ensures only affected CPUs pay the overhead of this accounting.
Unaffected CPUs will leave the padding unused and have their 'JMP
__x86_return_thunk' replaced with an actual 'RET' instruction.

Objtool has been modified to supply a .call_sites section that lists
all the 'CALL' instructions. Additionally the paravirt instruction
sites are iterated since they will have been patched from an indirect
call to direct calls (or direct instructions in which case it'll be
ignored).

Module handling and the actual thunk code for SKL will be added in
subsequent steps.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111147.470877038@infradead.org


# 80e4c1cd 15-Sep-2022 Thomas Gleixner <tglx@linutronix.de>

x86/retbleed: Add X86_FEATURE_CALL_DEPTH

Intel SKL CPUs fall back to other predictors when the RSB underflows. The
only microcode mitigation is IBRS which is insanely expensive. It comes
with performance drops of up to 30% depending on the workload.

A way less expensive, but nevertheless horrible mitigation is to track the
call depth in software and overeagerly fill the RSB when returns underflow
the software counter.

Provide a configuration symbol and a CPU misfeature bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111147.056176424@infradead.org


# bea75b33 15-Sep-2022 Thomas Gleixner <tglx@linutronix.de>

x86/Kconfig: Introduce function padding

Now that all functions are 16 byte aligned, add 16 bytes of NOP
padding in front of each function. This prepares things for software
call stack tracking and kCFI/FineIBT.

This significantly increases kernel .text size, around 5.1% on a
x86_64-defconfig-ish build.

However, per the random access argument used for alignment, these 16
extra bytes are code that wouldn't be used. Performance measurements
back this up by showing no significant performance regressions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111146.950884492@infradead.org


# 8f7c0d8b 15-Sep-2022 Thomas Gleixner <tglx@linutronix.de>

x86/Kconfig: Add CONFIG_CALL_THUNKS

In preparation for mitigating the Intel SKL RSB underflow issue in
software, add a new configuration symbol which allows to build the
required call thunk infrastructure conditionally.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111146.849523555@infradead.org


# d49a0626 15-Sep-2022 Peter Zijlstra <peterz@infradead.org>

arch: Introduce CONFIG_FUNCTION_ALIGNMENT

Generic function-alignment infrastructure.

Architectures can select FUNCTION_ALIGNMENT_xxB symbols; the
FUNCTION_ALIGNMENT symbol is then set to the largest such selected
size, 0 otherwise.

From this the -falign-functions compiler argument and __ALIGN macro
are set.

This incorporates the DEBUG_FORCE_FUNCTION_ALIGN_64B knob and future
alignment requirements for x86_64 (later in this series) into a single
place.

NOTE: also removes the 0x90 filler byte from the generic __ALIGN
primitive, that value makes no sense outside of x86.

NOTE: .balign 0 reverts to a no-op.

Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111143.719248727@infradead.org


# f71806d8 02-Sep-2022 Christophe Leroy <christophe.leroy@csgroup.eu>

x86: Remove CONFIG_ARCH_NR_GPIO

CONFIG_ARCH_NR_GPIO is not used anymore, remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>


# e3998434 29-Nov-2022 Mateusz Jończyk <mat.jonczyk@o2.pl>

x86/apic: Handle no CONFIG_X86_X2APIC on systems with x2APIC enabled by BIOS

A kernel that was compiled without CONFIG_X86_X2APIC was unable to boot on
platforms that have x2APIC already enabled in the BIOS before starting the
kernel.

The kernel was supposed to panic with an approprite error message in
validate_x2apic() due to the missing X2APIC support.

However, validate_x2apic() was run too late in the boot cycle, and the
kernel tried to initialize the APIC nonetheless. This resulted in an
earlier panic in setup_local_APIC() because the APIC was not registered.

In my experiments, a panic message in setup_local_APIC() was not visible
in the graphical console, which resulted in a hang with no indication
what has gone wrong.

Instead of calling panic(), disable the APIC, which results in a somewhat
working system with the PIC only (and no SMP). This way the user is able to
diagnose the problem more easily.

Disabling X2APIC mode is not an option because it's impossible on systems
with locked x2APIC.

The proper place to disable the APIC in this case is in check_x2apic(),
which is called early from setup_arch(). Doing this in
__apic_intr_mode_select() is too late.

Make check_x2apic() unconditionally available and remove the empty stub.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Robert Elliott (Servers) <elliott@hpe.com>
Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/d573ba1c-0dc4-3016-712a-cc23a8a33d42@molgen.mpg.de
Link: https://lore.kernel.org/lkml/20220911084711.13694-3-mat.jonczyk@o2.pl
Link: https://lore.kernel.org/all/20221129215008.7247-1-mat.jonczyk@o2.pl


# cc3fdda2 22-Nov-2022 Ard Biesheuvel <ardb@kernel.org>

x86/efi: Make the deprecated EFI handover protocol optional

The EFI handover protocol permits a bootloader to invoke the kernel as a
EFI PE/COFF application, while passing a bootparams struct as a third
argument to the entrypoint function call.

This has no basis in the UEFI specification, and there are better ways
to pass additional data to a UEFI application (UEFI configuration
tables, UEFI variables, UEFI protocols) than going around the
StartImage() boot service and jumping to a fixed offset in the loaded
image, just to call a different function that takes a third parameter.

The reason for handling struct bootparams in the bootloader was that the
EFI stub could only load initrd images from the EFI system partition,
and so passing it via struct bootparams was needed for loaders like
GRUB, which pass the initrd in memory, and may load it from anywhere,
including from the network. Another motivation was EFI mixed mode, which
could not use the initrd loader in the EFI stub at all due to 32/64 bit
incompatibilities (which will be fixed shortly [0]), and could not
invoke the ordinary PE/COFF entry point either, for the same reasons.

Given that loaders such as GRUB already carried the bootparams handling
in order to implement non-EFI boot, retaining that code and just passing
bootparams to the EFI stub was a reasonable choice (although defining an
alternate entrypoint could have been avoided.) However, the GRUB side
changes never made it upstream, and are only shipped by some of the
distros in their downstream versions.

In the meantime, EFI support has been added to other Linux architecture
ports, as well as to U-boot and systemd, including arch-agnostic methods
for passing initrd images in memory [1], and for doing mixed mode boot
[2], none of them requiring anything like the EFI handover protocol. So
given that only out-of-tree distro GRUB relies on this, let's permit it
to be omitted from the build, in preparation for retiring it completely
at a later date. (Note that systemd-boot does have an implementation as
well, but only uses it as a fallback for booting images that do not
implement the LoadFile2 based initrd loading method, i.e., v5.8 or older)

[0] https://lore.kernel.org/all/20220927085842.2860715-1-ardb@kernel.org/
[1] ec93fc371f01 ("efi/libstub: Add support for loading the initrd from a device path")
[2] 97aa276579b2 ("efi/x86: Add true mixed mode entry point into .compat section")

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221122161017.2426828-18-ardb@kernel.org


# 1fff234d 07-Nov-2022 Ard Biesheuvel <ardb@kernel.org>

efi: x86: Move EFI runtime map sysfs code to arch/x86

The EFI runtime map code is only wired up on x86, which is the only
architecture that has a need for it in its implementation of kexec.

So let's move this code under arch/x86 and drop all references to it
from generic code. To ensure that the efi_runtime_map_init() is invoked
at the appropriate time use a 'sync' subsys_initcall() that will be
called right after the EFI initcall made from generic code where the
original invocation of efi_runtime_map_init() resided.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Dave Young <dyoung@redhat.com>


# 4059ba65 01-Oct-2022 Ard Biesheuvel <ardb@kernel.org>

efi: memmap: Move EFI fake memmap support into x86 arch tree

The EFI fake memmap support is specific to x86, which manipulates the
EFI memory map in various different ways after receiving it from the EFI
stub. On other architectures, we have managed to push back on this, and
the EFI memory map is kept pristine.

So let's move the fake memmap code into the x86 arch tree, where it
arguably belongs.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# a474d3fb 11-Nov-2022 Thomas Gleixner <tglx@linutronix.de>

PCI/MSI: Get rid of PCI_MSI_IRQ_DOMAIN

What a zoo:

PCI_MSI
select GENERIC_MSI_IRQ

PCI_MSI_IRQ_DOMAIN
def_bool y
depends on PCI_MSI
select GENERIC_MSI_IRQ_DOMAIN

Ergo PCI_MSI enables PCI_MSI_IRQ_DOMAIN which in turn selects
GENERIC_MSI_IRQ_DOMAIN. So all the dependencies on PCI_MSI_IRQ_DOMAIN are
just an indirection to PCI_MSI.

Match the reality and just admit that PCI_MSI requires
GENERIC_MSI_IRQ_DOMAIN.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20221111122014.467556921@linutronix.de


# 1156b441 28-Oct-2022 Davidlohr Bueso <dave@stgolabs.net>

memregion: Add cpu_cache_invalidate_memregion() interface

With CXL security features, and CXL dynamic provisioning, global CPU
cache flushing nvdimm requirements are no longer specific to that
subsystem, even beyond the scope of security_ops. CXL will need such
semantics for features not necessarily limited to persistent memory.

The functionality this is enabling is to be able to instantaneously
secure erase potentially terabytes of memory at once and the kernel
needs to be sure that none of the data from before the erase is still
present in the cache. It is also used when unlocking a memory device
where speculative reads and firmware accesses could have cached poison
from before the device was unlocked. Lastly this facility is used when
mapping new devices, or new capacity into an established physical
address range. I.e. when the driver switches DeviceA mapping AddressX to
DeviceB mapping AddressX then any cached data from DeviceA:AddressX
needs to be invalidated.

This capability is typically only used once per-boot (for unlock), or
once per bare metal provisioning event (secure erase), like when handing
off the system to another tenant or decommissioning a device. It may
also be used for dynamic CXL region provisioning.

Users must first call cpu_cache_has_invalidate_memregion() to know
whether this functionality is available on the architecture. On x86 this
respects the constraints of when wbinvd() is tolerable. It is already
the case that wbinvd() is problematic to allow in VMs due its global
performance impact and KVM, for example, has been known to just trap and
ignore the call. With confidential computing guest execution of wbinvd()
may even trigger an exception. Given guests should not be messing with
the bare metal address map via CXL configuration changes
cpu_cache_has_invalidate_memregion() returns false in VMs.

While this global cache invalidation facility, is exported to modules,
since NVDIMM and CXL support can be built as a module, it is not for
general use. The intent is that this facility is not available outside
of specific "device-memory" use cases. To make that expectation as clear
as possible the API is scoped to a new "DEVMEM" module namespace that
only the NVDIMM and CXL subsystems are expected to import.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 49f88c70 28-Sep-2022 Paul E. McKenney <paulmck@kernel.org>

arch/x86: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option

The x86 architecture uses an add-to-memory instruction to implement
this_cpu_add(), which is NMI safe. This means that the old and
more-efficient srcu_read_lock() may be used in NMI context, without
the need for srcu_read_lock_nmisafe(). Therefore, add the new Kconfig
option ARCH_HAS_NMI_SAFE_THIS_CPU_OPS to arch/x86/Kconfig, which will
cause NEED_SRCU_NMI_SAFE to be deselected, thus preserving the current
srcu_read_lock() behavior.

Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: <x86@kernel.org>


# 33806e7c 29-Sep-2022 Nathan Chancellor <nathan@kernel.org>

x86/Kconfig: Drop check for -mabi=ms for CONFIG_EFI_STUB

A recent change in LLVM made CONFIG_EFI_STUB unselectable because it no
longer pretends to support -mabi=ms, breaking the dependency in
Kconfig. Lack of CONFIG_EFI_STUB can prevent kernels from booting via
EFI in certain circumstances.

This check was added by

8f24f8c2fc82 ("efi/libstub: Annotate firmware routines as __efiapi")

to ensure that __attribute__((ms_abi)) was available, as -mabi=ms is
not actually used in any cflags.

According to the GCC documentation, this attribute has been supported
since GCC 4.4.7. The kernel currently requires GCC 5.1 so this check is
not necessary; even when that change landed in 5.6, the kernel required
GCC 4.9 so it was unnecessary then as well.

Clang supports __attribute__((ms_abi)) for all versions that are
supported for building the kernel so no additional check is needed.
Remove the 'depends on' line altogether to allow CONFIG_EFI_STUB to be
selected when CONFIG_EFI is enabled, regardless of compiler.

Fixes: 8f24f8c2fc82 ("efi/libstub: Annotate firmware routines as __efiapi")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Cc: stable@vger.kernel.org
Link: https://github.com/llvm/llvm-project/commit/d1ad006a8f64bdc17f618deffa9e7c91d82c444d


# 4ca8cc8d 15-Sep-2022 Alexander Potapenko <glider@google.com>

x86: kmsan: enable KMSAN builds for x86

Make KMSAN usable by adding the necessary Kconfig bits.

Also declare x86-specific functions checking address validity in
arch/x86/include/asm/kmsan.h.

Link: https://lkml.kernel.org/r/20220915150417.722975-44-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 7cf8f44a 15-Sep-2022 Alexander Potapenko <glider@google.com>

x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS

dentry_string_cmp() calls read_word_at_a_time(), which might read
uninitialized bytes to optimize string comparisons. Disabling
CONFIG_DCACHE_WORD_ACCESS should prohibit this optimization, as well as
(probably) similar ones.

Link: https://lkml.kernel.org/r/20220915150417.722975-39-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# d911c67e 15-Sep-2022 Alexander Potapenko <glider@google.com>

x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN

This is needed to allow memory tools like KASAN and KMSAN see the memory
accesses from the checksum code. Without CONFIG_GENERIC_CSUM the tools
can't see memory accesses originating from handwritten assembly code.

For KASAN it's a question of detecting more bugs, for KMSAN using the C
implementation also helps avoid false positives originating from seemingly
uninitialized checksum values.

Link: https://lkml.kernel.org/r/20220915150417.722975-38-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# eed9a328 18-Sep-2022 Yu Zhao <yuzhao@google.com>

mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG

Some architectures support the accessed bit in non-leaf PMD entries, e.g.,
x86 sets the accessed bit in a non-leaf PMD entry when using it as part of
linear address translation [1]. Page table walkers that clear the
accessed bit may use this capability to reduce their search space.

Note that:
1. Although an inline function is preferable, this capability is added
as a configuration option for consistency with the existing macros.
2. Due to the little interest in other varieties, this capability was
only tested on Intel and AMD CPUs.

Thanks to the following developers for their efforts [2][3].
Randy Dunlap <rdunlap@infradead.org>
Stephen Rothwell <sfr@canb.auug.org.au>

[1]: Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3 (June 2021), section 4.8
[2] https://lore.kernel.org/r/bfdcc7c8-922f-61a9-aa15-7e7250f04af7@infradead.org/
[3] https://lore.kernel.org/r/20220413151513.5a0d7a7e@canb.auug.org.au/

Link: https://lkml.kernel.org/r/20220918080010.2920238-3-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# ceea991a 03-Sep-2022 Jiri Olsa <jolsa@kernel.org>

bpf: Move bpf_dispatcher function out of ftrace locations

The dispatcher function is attached/detached to trampoline by
dispatcher update function. At the same time it's available as
ftrace attachable function.

After discussion [1] the proposed solution is to use compiler
attributes to alter bpf_dispatcher_##name##_func function:

- remove it from being instrumented with __no_instrument_function__
attribute, so ftrace has no track of it

- but still generate 5 nop instructions with patchable_function_entry(5)
attribute, which are expected by bpf_arch_text_poke used by
dispatcher update function

Enabling HAVE_DYNAMIC_FTRACE_NO_PATCHABLE option for x86, so
__patchable_function_entries functions are not part of ftrace/mcount
locations.

Adding attributes to bpf_dispatcher_XXX function on x86_64 so it's
kept out of ftrace locations and has 5 byte nop generated at entry.

These attributes need to be arch specific as pointed out by Ilya
Leoshkevic in here [2].

The dispatcher image is generated only for x86_64 arch, so the
code can stay as is for other archs.

[1] https://lore.kernel.org/bpf/20220722110811.124515-1-jolsa@kernel.org/
[2] https://lore.kernel.org/bpf/969a14281a7791c334d476825863ee449964dd0c.camel@linux.ibm.com/

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20220903131154.420467-3-jolsa@kernel.org


# 7987448f 13-Jul-2022 Stephen Kitt <steve@sk2.org>

x86/Kconfig: Specify idle=poll instead of no-hlt

Commit 27be45700021 ("x86 idle: remove 32-bit-only "no-hlt" parameter,
hlt_works_ok flag") removed no-hlt, but CONFIG_APM still refers to
it. Suggest "idle=poll" instead, based on the commit message:

> If a user wants to avoid HLT, then "idle=poll"
> is much more useful, as it avoids invocation of HLT
> in idle, while "no-hlt" failed to do so.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220713160840.1577569-1-steve@sk2.org


# b8d1d163 16-Aug-2022 Daniel Sneddon <daniel.sneddon@linux.intel.com>

x86/apic: Don't disable x2APIC if locked

The APIC supports two modes, legacy APIC (or xAPIC), and Extended APIC
(or x2APIC). X2APIC mode is mostly compatible with legacy APIC, but
it disables the memory-mapped APIC interface in favor of one that uses
MSRs. The APIC mode is controlled by the EXT bit in the APIC MSR.

The MMIO/xAPIC interface has some problems, most notably the APIC LEAK
[1]. This bug allows an attacker to use the APIC MMIO interface to
extract data from the SGX enclave.

Introduce support for a new feature that will allow the BIOS to lock
the APIC in x2APIC mode. If the APIC is locked in x2APIC mode and the
kernel tries to disable the APIC or revert to legacy APIC mode a GP
fault will occur.

Introduce support for a new MSR (IA32_XAPIC_DISABLE_STATUS) and handle
the new locked mode when the LEGACY_XAPIC_DISABLED bit is set by
preventing the kernel from trying to disable the x2APIC.

On platforms with the IA32_XAPIC_DISABLE_STATUS MSR, if SGX or TDX are
enabled the LEGACY_XAPIC_DISABLED will be set by the BIOS. If
legacy APIC is required, then it SGX and TDX need to be disabled in the
BIOS.

[1]: https://aepicleak.com/aepicleak.pdf

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Link: https://lkml.kernel.org/r/20220816231943.1152579-1-daniel.sneddon@linux.intel.com


# 09498135 03-Aug-2022 Miguel Ojeda <ojeda@kernel.org>

x86: enable initial Rust support

Note that only x86_64 is covered and not all features nor mitigations
are handled, but it is enough as a starting point and showcases
the basics needed to add Rust support for a new architecture.

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Co-developed-by: Alex Gaynor <alex.gaynor@gmail.com>
Signed-off-by: Alex Gaynor <alex.gaynor@gmail.com>
Co-developed-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com>
Co-developed-by: David Gow <davidgow@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>


# 3c516f89 08-Sep-2022 Sami Tolvanen <samitolvanen@google.com>

x86: Add support for CONFIG_CFI_CLANG

With CONFIG_CFI_CLANG, the compiler injects a type preamble immediately
before each function and a check to validate the target function type
before indirect calls:

; type preamble
__cfi_function:
mov <id>, %eax
function:
...
; indirect call check
mov -<id>,%r10d
add -0x4(%r11),%r10d
je .Ltmp1
ud2
.Ltmp1:
call __x86_indirect_thunk_r11

Add error handling code for the ud2 traps emitted for the checks, and
allow CONFIG_CFI_CLANG to be selected on x86_64.

This produces the following oops on CFI failure (generated using lkdtm):

[ 21.441706] CFI failure at lkdtm_indirect_call+0x16/0x20 [lkdtm]
(target: lkdtm_increment_int+0x0/0x10 [lkdtm]; expected type: 0x7e0c52a)
[ 21.444579] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 21.445296] CPU: 0 PID: 132 Comm: sh Not tainted
5.19.0-rc8-00020-g9f27360e674c #1
[ 21.445296] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 21.445296] RIP: 0010:lkdtm_indirect_call+0x16/0x20 [lkdtm]
[ 21.445296] Code: 52 1c c0 48 c7 c1 c5 50 1c c0 e9 25 48 2a cc 0f 1f
44 00 00 49 89 fb 48 c7 c7 50 b4 1c c0 41 ba 5b ad f3 81 45 03 53 f8
[ 21.445296] RSP: 0018:ffffa9f9c02ffdc0 EFLAGS: 00000292
[ 21.445296] RAX: 0000000000000027 RBX: ffffffffc01cb300 RCX: 385cbbd2e070a700
[ 21.445296] RDX: 0000000000000000 RSI: c0000000ffffdfff RDI: ffffffffc01cb450
[ 21.445296] RBP: 0000000000000006 R08: 0000000000000000 R09: ffffffff8d081610
[ 21.445296] R10: 00000000bcc90825 R11: ffffffffc01c2fc0 R12: 0000000000000000
[ 21.445296] R13: ffffa31b827a6000 R14: 0000000000000000 R15: 0000000000000002
[ 21.445296] FS: 00007f08b42216a0(0000) GS:ffffa31b9f400000(0000)
knlGS:0000000000000000
[ 21.445296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 21.445296] CR2: 0000000000c76678 CR3: 0000000001940000 CR4: 00000000000006f0
[ 21.445296] Call Trace:
[ 21.445296] <TASK>
[ 21.445296] lkdtm_CFI_FORWARD_PROTO+0x30/0x50 [lkdtm]
[ 21.445296] direct_entry+0x12d/0x140 [lkdtm]
[ 21.445296] full_proxy_write+0x5d/0xb0
[ 21.445296] vfs_write+0x144/0x460
[ 21.445296] ? __x64_sys_wait4+0x5a/0xc0
[ 21.445296] ksys_write+0x69/0xd0
[ 21.445296] do_syscall_64+0x51/0xa0
[ 21.445296] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 21.445296] RIP: 0033:0x7f08b41a6fe1
[ 21.445296] Code: be 07 00 00 00 41 89 c0 e8 7e ff ff ff 44 89 c7 89
04 24 e8 91 c6 02 00 8b 04 24 48 83 c4 68 c3 48 63 ff b8 01 00 00 03
[ 21.445296] RSP: 002b:00007ffcdf65c2e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 21.445296] RAX: ffffffffffffffda RBX: 00007f08b4221690 RCX: 00007f08b41a6fe1
[ 21.445296] RDX: 0000000000000012 RSI: 0000000000c738f0 RDI: 0000000000000001
[ 21.445296] RBP: 0000000000000001 R08: fefefefefefefeff R09: fefefefeffc5ff4e
[ 21.445296] R10: 00007f08b42222b0 R11: 0000000000000246 R12: 0000000000c738f0
[ 21.445296] R13: 0000000000000012 R14: 00007ffcdf65c401 R15: 0000000000c70450
[ 21.445296] </TASK>
[ 21.445296] Modules linked in: lkdtm
[ 21.445296] Dumping ftrace buffer:
[ 21.445296] (ftrace buffer empty)
[ 21.471442] ---[ end trace 0000000000000000 ]---
[ 21.471811] RIP: 0010:lkdtm_indirect_call+0x16/0x20 [lkdtm]
[ 21.472467] Code: 52 1c c0 48 c7 c1 c5 50 1c c0 e9 25 48 2a cc 0f 1f
44 00 00 49 89 fb 48 c7 c7 50 b4 1c c0 41 ba 5b ad f3 81 45 03 53 f8
[ 21.474400] RSP: 0018:ffffa9f9c02ffdc0 EFLAGS: 00000292
[ 21.474735] RAX: 0000000000000027 RBX: ffffffffc01cb300 RCX: 385cbbd2e070a700
[ 21.475664] RDX: 0000000000000000 RSI: c0000000ffffdfff RDI: ffffffffc01cb450
[ 21.476471] RBP: 0000000000000006 R08: 0000000000000000 R09: ffffffff8d081610
[ 21.477127] R10: 00000000bcc90825 R11: ffffffffc01c2fc0 R12: 0000000000000000
[ 21.477959] R13: ffffa31b827a6000 R14: 0000000000000000 R15: 0000000000000002
[ 21.478657] FS: 00007f08b42216a0(0000) GS:ffffa31b9f400000(0000)
knlGS:0000000000000000
[ 21.479577] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 21.480307] CR2: 0000000000c76678 CR3: 0000000001940000 CR4: 00000000000006f0
[ 21.481460] Kernel panic - not syncing: Fatal exception

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220908215504.3686827-23-samitolvanen@google.com


# 3d923c5f 10-Jul-2022 Anshuman Khandual <anshuman.khandual@arm.com>

mm/mmap: drop ARCH_HAS_VM_GET_PAGE_PROT

Now all the platforms enable ARCH_HAS_GET_PAGE_PROT. They define and
export own vm_get_page_prot() whether custom or standard
DECLARE_VM_GET_PAGE_PROT. Hence there is no need for default generic
fallback for vm_get_page_prot(). Just drop this fallback and also
ARCH_HAS_GET_PAGE_PROT mechanism.

Link: https://lkml.kernel.org/r/20220711070600.2378316-27-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 4313a249 23-May-2022 Arnd Bergmann <arnd@arndb.de>

arch/*/: remove CONFIG_VIRT_TO_BUS

All architecture-independent users of virt_to_bus() and bus_to_virt()
have been fixed to use the dma mapping interfaces or have been
removed now. This means the definitions on most architectures, and the
CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed.

The only exceptions to this are a few network and scsi drivers for m68k
Amiga and VME machines and ppc32 Macintosh. These drivers work correctly
with the old interfaces and are probably not worth changing.

On alpha and parisc, virt_to_bus() were still used in asm/floppy.h.
alpha can use isa_virt_to_bus() like x86 does, and parisc can just
open-code the virt_to_phys() here, as this is architecture specific
code.

I tried updating the bus-virt-phys-mapping.rst documentation, which
started as an email from Linus to explain some details of the Linux-2.0
driver interfaces. The bits about virt_to_bus() were declared obsolete
backin 2000, and the rest is not all that relevant any more, so in the
end I just decided to remove the file completely.

Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# 9592eef7 05-Jul-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove CONFIG_ARCH_RANDOM

When RDRAND was introduced, there was much discussion on whether it
should be trusted and how the kernel should handle that. Initially, two
mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and
"nordrand", a boot-time switch.

Later the thinking evolved. With a properly designed RNG, using RDRAND
values alone won't harm anything, even if the outputs are malicious.
Rather, the issue is whether those values are being *trusted* to be good
or not. And so a new set of options were introduced as the real
ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu".
With these options, RDRAND is used, but it's not always credited. So in
the worst case, it does nothing, and in the best case, maybe it helps.

Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the
center and became something certain platforms force-select.

The old options don't really help with much, and it's a bit odd to have
special handling for these instructions when the kernel can deal fine
with the existence or untrusted existence or broken existence or
non-existence of that CPU capability.

Simplify the situation by removing CONFIG_ARCH_RANDOM and using the
ordinary asm-generic fallback pattern instead, keeping the two options
that are actually used. For now it leaves "nordrand" for now, as the
removal of that will take a different route.

Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 24a9c541 08-Jun-2022 Frederic Weisbecker <frederic@kernel.org>

context_tracking: Split user tracking Kconfig

Context tracking is going to be used not only to track user transitions
but also idle/IRQs/NMIs. The user tracking part will then become a
separate feature. Prepare Kconfig for that.

[ frederic: Apply Max Filippov feedback. ]

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>


# 2d17bd24 24-Jul-2022 Masahiro Yamada <masahiroy@kernel.org>

x86/purgatory: Omit use of bin2c

The .incbin assembler directive is much faster than bin2c + $(CC).

Do similar refactoring as in

4c0f032d4963 ("s390/purgatory: Omit use of bin2c").

Please note the .quad directive matches to size_t in C (both 8
byte) because the purgatory is compiled only for the 64-bit kernel.
(KEXEC_FILE depends on X86_64).

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220725020812.622255-2-masahiroy@kernel.org


# 1b866781 17-Jun-2022 Nathan Chancellor <nathan@kernel.org>

x86/Kconfig: Fix CONFIG_CC_HAS_SANE_STACKPROTECTOR when cross compiling with clang

Chimera Linux notes that CONFIG_CC_HAS_SANE_STACKPROTECTOR cannot be
enabled when cross compiling an x86_64 kernel with clang, even though it
does work when natively compiling.

When building on aarch64:

$ make -sj"$(nproc)" ARCH=x86_64 LLVM=1 defconfig

$ grep STACKPROTECTOR .config

When building on x86_64:

$ make -sj"$(nproc)" ARCH=x86_64 LLVM=1 defconfig

$ grep STACKPROTECTOR .config
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
CONFIG_HAVE_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR_STRONG=y

When clang is invoked without a '--target' flag, code is generated for
the default target, which is usually the host (it is configurable via
cmake). As a result, the has-stack-protector scripts will generate code
for the default target but check for x86 specific segment registers,
which cannot succeed if the default target is not x86.

$(CLANG_FLAGS) contains an explicit '--target' flag so pass that
variable along to the has-stack-protector scripts so that the stack
protector can be enabled when cross compiling with clang. The 32-bit
stack protector cannot currently be enabled with clang, as it does not
support '-mstack-protector-guard-symbol', so this results in no
functional change for ARCH=i386 when cross compiling.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://github.com/chimera-linux/cports/commit/0fb7e506d5f83fdf2104feb22cdac34934561226
Link: https://github.com/llvm/llvm-project/issues/48553
Link: https://lkml.kernel.org/r/20220617180845.2788442-1-nathan@kernel.org


# b69a2afd 30-Jun-2022 Jonathan McDowell <noodles@fb.com>

x86/kexec: Carry forward IMA measurement log on kexec

On kexec file load, the Integrity Measurement Architecture (IMA)
subsystem may verify the IMA signature of the kernel and initramfs, and
measure it. The command line parameters passed to the kernel in the
kexec call may also be measured by IMA.

A remote attestation service can verify a TPM quote based on the TPM
event log, the IMA measurement list and the TPM PCR data. This can
be achieved only if the IMA measurement log is carried over from the
current kernel to the next kernel across the kexec call.

PowerPC and ARM64 both achieve this using device tree with a
"linux,ima-kexec-buffer" node. x86 platforms generally don't make use of
device tree, so use the setup_data mechanism to pass the IMA buffer to
the new kernel.

Signed-off-by: Jonathan McDowell <noodles@fb.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> # IMA function definitions
Link: https://lore.kernel.org/r/YmKyvlF3my1yWTvK@noodles-fedora-PC23Y6EG


# 4510bffb 11-May-2022 Mark Rutland <mark.rutland@arm.com>

arch: make TRACE_IRQFLAGS_NMI_SUPPORT generic

On most architectures, IRQ flag tracing is disabled in NMI context, and
architectures need to define and select TRACE_IRQFLAGS_NMI_SUPPORT in
order to enable this.

Commit:

859d069ee1ddd878 ("lockdep: Prepare for NMI IRQ state tracking")

Permitted IRQ flag tracing in NMI context, allowing lockdep to work in
NMI context where an architecture had suitable entry logic. At the time,
most architectures did not have such suitable entry logic, and this broke
lockdep on such architectures. Thus, this was partially disabled in
commit:

ed00495333ccc80f ("locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs")

... with architectures needing to select TRACE_IRQFLAGS_NMI_SUPPORT to
enable IRQ flag tracing in NMI context.

Currently TRACE_IRQFLAGS_NMI_SUPPORT is defined under
arch/x86/Kconfig.debug. Move it to arch/Kconfig so architectures can
select it without having to provide their own definition.

Since the regular TRACE_IRQFLAGS_SUPPORT is selected by
arch/x86/Kconfig, the select of TRACE_IRQFLAGS_NMI_SUPPORT is moved
there too.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220511131733.4074499-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>


# b648ab48 23-Jul-2022 Ben Hutchings <ben@decadent.org.uk>

x86/speculation: Make all RETbleed mitigations 64-bit only

The mitigations for RETBleed are currently ineffective on x86_32 since
entry_32.S does not use the required macros. However, for an x86_32
target, the kconfig symbols for them are still enabled by default and
/sys/devices/system/cpu/vulnerabilities/retbleed will wrongly report
that mitigations are in place.

Make all of these symbols depend on X86_64, and only enable RETHUNK by
default on X86_64.

Fixes: f43b9876e857 ("x86/retbleed: Add fine grained Kconfig knobs")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/YtwSR3NNsWp1ohfV@decadent.org.uk


# 1e9fdf21 08-Jul-2022 Peter Zijlstra <peterz@infradead.org>

mmu_gather: Remove per arch tlb_{start,end}_vma()

Scattered across the archs are 3 basic forms of tlb_{start,end}_vma().
Provide two new MMU_GATHER_knobs to enumerate them and remove the per
arch tlb_{start,end}_vma() implementations.

- MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range()
but does *NOT* want to call it for each VMA.

- MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the
invalidate across multiple VMAs if possible.

With these it is possible to capture the three forms:

1) empty stubs;
select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS

2) start: flush_cache_range(), end: empty;
select MMU_GATHER_MERGE_VMAS

3) start: flush_cache_range(), end: flush_tlb_range();
default

Obviously, if the architecture does not have flush_cache_range() then
it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f43b9876 27-Jun-2022 Peter Zijlstra <peterz@infradead.org>

x86/retbleed: Add fine grained Kconfig knobs

Do fine-grained Kconfig for all the various retbleed parts.

NOTE: if your compiler doesn't support return thunks this will
silently 'upgrade' your mitigation to IBPB, you might not like this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>


# 7fbf47c7 14-Jun-2022 Alexandre Chartre <alexandre.chartre@oracle.com>

x86/bugs: Add AMD retbleed= boot parameter

Add the "retbleed=<value>" boot parameter to select a mitigation for
RETBleed. Possible values are "off", "auto" and "unret"
(JMP2RET mitigation). The default value is "auto".

Currently, "retbleed=auto" will select the unret mitigation on
AMD and Hygon and no mitigation on Intel (JMP2RET is not effective on
Intel).

[peterz: rebase; add hygon]
[jpoimboe: cleanups]

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>


# 3f9dfbeb 06-Jun-2022 Juergen Gross <jgross@suse.com>

virtio: replace arch_has_restricted_virtio_memory_access()

Instead of using arch_has_restricted_virtio_memory_access() together
with CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS, replace those
with platform_has() and a new platform feature
PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> # Arm64 only
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Borislav Petkov <bp@suse.de>


# a77a94f8 25-May-2022 Borislav Petkov <bp@suse.de>

x86/microcode: Default-disable late loading

It is dangerous and it should not be used anyway - there's a nice early
loading already.

Requested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220525161232.14924-3-bp@alien8.de


# 181b6f40 25-May-2022 Borislav Petkov <bp@suse.de>

x86/microcode: Rip out the OLD_INTERFACE

Everything should be using the early initrd loading by now.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220525161232.14924-2-bp@alien8.de


# 5f3da8c0 19-Apr-2022 Josh Poimboeuf <jpoimboe@kernel.org>

objtool: Add CONFIG_HAVE_UACCESS_VALIDATION

Allow an arch specify that it has objtool uaccess validation with
CONFIG_HAVE_UACCESS_VALIDATION. For now, doing so unconditionally
selects CONFIG_OBJTOOL.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/d393d5e2fe73aec6e8e41d5c24f4b6fe8583f2d8.1650384225.git.jpoimboe@redhat.com


# 758cd94a 25-May-2022 Juerg Haefliger <juerg.haefliger@canonical.com>

x86/Kconfig: Fix indentation and add endif comments to arch/x86/Kconfig

The convention for indentation seems to be a single tab. Help text is
further indented by an additional two whitespaces. Fix the lines that
violate these rules.

While add it, add missing trailing endif comments and squeeze multiple
empty lines.

Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20220525133203.52463-2-juerg.haefliger@canonical.com


# 0cbed0ee 05-Apr-2022 Guo Ren <guoren@kernel.org>

arch: Add SYSVIPC_COMPAT for all architectures

The existing per-arch definitions are pretty much historic cruft.
Move SYSVIPC_COMPAT into init/Kconfig.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Link: https://lore.kernel.org/r/20220405071314.3225832-5-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# 47010c04 29-Apr-2022 Muchun Song <songmuchun@bytedance.com>

mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP*

The word of "free" is not expressive enough to express the feature of
optimizing vmemmap pages associated with each HugeTLB, rename this keywork
to "optimize". In this patch , cheanup configs to make code more
expressive.

Link: https://lkml.kernel.org/r/20220404074652.68024-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# e10cd4b0 29-Apr-2022 Christoph Hellwig <hch@infradead.org>

x86/mm: enable ARCH_HAS_VM_GET_PAGE_PROT

This defines and exports a platform specific custom vm_get_page_prot() via
subscribing ARCH_HAS_VM_GET_PAGE_PROT. This also unsubscribes from config
ARCH_HAS_FILTER_PGPROT, after dropping off arch_filter_pgprot() and
arch_vm_get_page_prot().

Link: https://lkml.kernel.org/r/20220414062125.609297-6-anshuman.khandual@arm.com
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 2e4ec02b 29-Apr-2022 Muchun Song <songmuchun@bytedance.com>

mm: hugetlb_vmemmap: introduce ARCH_WANT_HUGETLB_PAGE_FREE_VMEMMAP

The feature of minimizing overhead of struct page associated with each
HugeTLB page is implemented on x86_64, however, the infrastructure of this
feature is already there, we could easily enable it for other
architectures. Introduce ARCH_WANT_HUGETLB_PAGE_FREE_VMEMMAP for other
architectures to be easily enabled. Just select this config if they want
to enable this feature.

Link: https://lkml.kernel.org/r/20220331065640.5777-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Barry Song <baohua@kernel.org>
Tested-by: Barry Song <baohua@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 489e355b 18-Apr-2022 Josh Poimboeuf <jpoimboe@redhat.com>

objtool: Add HAVE_NOINSTR_VALIDATION

Remove CONFIG_NOINSTR_VALIDATION's dependency on HAVE_OBJTOOL, since
other arches might want to implement objtool without it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/488e94f69db4df154499bc098573d90e5db1c826.1650300597.git.jpoimboe@redhat.com


# 22102f45 18-Apr-2022 Josh Poimboeuf <jpoimboe@redhat.com>

objtool: Make noinstr hacks optional

Objtool has some hacks in place to workaround toolchain limitations
which otherwise would break no-instrumentation rules. Make the hacks
explicit (and optional for other arches) by turning it into a cmdline
option and kernel config option.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/b326eeb9c33231b9dfbb925f194ed7ee40edcd7c.1650300597.git.jpoimboe@redhat.com


# 4ab7674f 18-Apr-2022 Josh Poimboeuf <jpoimboe@redhat.com>

objtool: Make jump label hack optional

Objtool secretly does a jump label hack to overcome the limitations of
the toolchain. Make the hack explicit (and optional for other arches)
by turning it into a cmdline option and kernel config option.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/3bdcbfdd27ecb01ddec13c04bdf756a583b13d24.1650300597.git.jpoimboe@redhat.com


# 03f16cd0 18-Apr-2022 Josh Poimboeuf <jpoimboe@redhat.com>

objtool: Add CONFIG_OBJTOOL

Now that stack validation is an optional feature of objtool, add
CONFIG_OBJTOOL and replace most usages of CONFIG_STACK_VALIDATION with
it.

CONFIG_STACK_VALIDATION can now be considered to be frame-pointer
specific. CONFIG_UNWINDER_ORC is already inherently valid for live
patching, so no need to "validate" it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/939bf3d85604b2a126412bf11af6e3bd3b872bcb.1650300597.git.jpoimboe@redhat.com


# 8b766b0f 25-Feb-2022 Michal Suchanek <msuchanek@suse.de>

sysfb: Enable boot time VESA graphic mode selection

Since switch to simplefb/simpledrm VESA graphic mode selection with vga=
kernel parameter is no longer available with legacy BIOS.

The x86 realmode boot code enables the VESA graphic modes when option
FB_BOOT_VESA_SUPPORT is enabled.

This option is selected by vesafb but not simplefb/simpledrm.

To enable use of VESA modes with simplefb in legacy BIOS boot mode drop
dependency of BOOT_VESA_SUPPORT on FB, also drop the FB_ prefix. Select
the option from sysfb rather than the drivers that depend on it.

The BOOT_VESA_SUPPORT is not specific to framebuffer but rather to x86
platform, move it from fbdev to x86 Kconfig.

Fixes: e3263ab389a7 ("x86: provide platform-devices for boot-framebuffers")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/948c39940a4e99f5b43bdbcbe537faae71a43e1d.1645822213.git.msuchanek@suse.de


# 9c55d99e 19-May-2022 Borislav Petkov <bp@suse.de>

x86/microcode: Add explicit CPU vendor dependency

Add an explicit dependency to the respective CPU vendor so that the
respective microcode support for it gets built only when that support is
enabled.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/8ead0da9-9545-b10d-e3db-7df1a1f219e4@infradead.org


# bf00745e 11-May-2022 Andy Lutomirski <luto@kernel.org>

x86/vsyscall: Remove CONFIG_LEGACY_VSYSCALL_EMULATE

CONFIG_LEGACY_VSYSCALL_EMULATE is, as far as I know, only needed for the
combined use of exotic and outdated debugging mechanisms with outdated
binaries. At this point, no one should be using it. Eventually, dynamic
switching of vsyscalls will be implemented, but this is much more
complicated to support in EMULATE mode than XONLY mode.

So let's force all the distros off of EMULATE mode. If anyone actually
needs it, they can set vsyscall=emulate, and the kernel can then get
away with refusing to support newer security models if that option is
set.

[ bp: Remove "we"s. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Florian Weimer <fweimer@redhat.com>
Link: https://lore.kernel.org/r/898932fe61db6a9d61bc2458fa2f6049f1ca9f5c.1652290558.git.luto@kernel.org


# c7bda0dc 13-Jan-2022 Borislav Petkov <bp@suse.de>

x86: Remove a.out support

Commit

eac616557050 ("x86: Deprecate a.out support")

deprecated a.out support with the promise to remove it a couple of
releases later. That commit landed in v5.1.

Now it is more than a couple of releases later, no one has complained so
remove it.

Fold in a hunk removing the reference to arch/x86/ia32/ia32_aout.c in
MAINTAINERS:

https://lore.kernel.org/r/20220316050828.17255-1-lukas.bulwahn@gmail.com

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220113160115.5375-1-bp@alien8.de


# 968b4931 05-Apr-2022 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm: Make DMA memory shared for TD guest

Intel TDX doesn't allow VMM to directly access guest private memory.
Any memory that is required for communication with the VMM must be
shared explicitly. The same rule applies for any DMA to and from the
TDX guest. All DMA pages have to be marked as shared pages. A generic way
to achieve this without any changes to device drivers is to use the
SWIOTLB framework.

The previous patch ("Add support for TDX shared memory") gave TDX guests
the _ability_ to make some pages shared, but did not make any pages
shared. This actually marks SWIOTLB buffers *as* shared.

Start returning true for cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) in
TDX guests. This has several implications:

- Allows the existing mem_encrypt_init() to be used for TDX which
sets SWIOTLB buffers shared (aka. "decrypted").
- Ensures that all DMA is routed via the SWIOTLB mechanism (see
pci_swiotlb_detect())

Stop selecting DYNAMIC_PHYSICAL_MASK directly. It will get set
indirectly by selecting X86_MEM_ENCRYPT.

mem_encrypt_init() is currently under an AMD-specific #ifdef. Move it to
a generic area of the header.

Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220405232939.73860-28-kirill.shutemov@linux.intel.com


# 77a512e3 05-Apr-2022 Sean Christopherson <seanjc@google.com>

x86/boot: Avoid #VE during boot for TDX platforms

There are a few MSRs and control register bits that the kernel
normally needs to modify during boot. But, TDX disallows
modification of these registers to help provide consistent security
guarantees. Fortunately, TDX ensures that these are all in the correct
state before the kernel loads, which means the kernel does not need to
modify them.

The conditions to avoid are:

* Any writes to the EFER MSR
* Clearing CR4.MCE

This theoretically makes the guest boot more fragile. If, for instance,
EFER was set up incorrectly and a WRMSR was performed, it will trigger
early exception panic or a triple fault, if it's before early
exceptions are set up. However, this is likely to trip up the guest
BIOS long before control reaches the kernel. In any case, these kinds
of problems are unlikely to occur in production environments, and
developers have good debug tools to fix them quickly.

Change the common boot code to work on TDX and non-TDX systems.
This should have no functional effect on non-TDX systems.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20220405232939.73860-24-kirill.shutemov@linux.intel.com


# 65fab5bc 05-Apr-2022 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/tdx: Exclude shared bit from __PHYSICAL_MASK

In TDX guests, by default memory is protected from host access. If a
guest needs to communicate with the VMM (like the I/O use case), it uses
a single bit in the physical address to communicate the protected/shared
attribute of the given page.

In the x86 ARCH code, __PHYSICAL_MASK macro represents the width of the
physical address in the given architecture. It is used in creating
physical PAGE_MASK for address bits in the kernel. Since in TDX guest,
a single bit is used as metadata, it needs to be excluded from valid
physical address bits to avoid using incorrect addresses bits in the
kernel.

Enable DYNAMIC_PHYSICAL_MASK to support updating the __PHYSICAL_MASK.

Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20220405232939.73860-6-kirill.shutemov@linux.intel.com


# 41394e33 05-Apr-2022 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/tdx: Extend the confidential computing API to support TDX guests

Confidential Computing (CC) features (like string I/O unroll support,
memory encryption/decryption support, etc) are conditionally enabled
in the kernel using cc_platform_has() API. Since TDX guests also need
to use these CC features, extend cc_platform_has() API and add TDX
guest-specific CC attributes support.

CC API also provides an interface to deal with encryption mask. Extend
it to cover TDX.

Details about which bit in the page table entry to be used to indicate
shared/private state is determined by using the TDINFO TDCALL.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20220405232939.73860-5-kirill.shutemov@linux.intel.com


# 59bd54a8 05-Apr-2022 Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

x86/tdx: Detect running as a TDX guest in early boot

In preparation of extending cc_platform_has() API to support TDX guest,
use CPUID instruction to detect support for TDX guests in the early
boot code (via tdx_early_init()). Since copy_bootdata() is the first
user of cc_platform_has() API, detect the TDX guest status before it.

Define a synthetic feature flag (X86_FEATURE_TDX_GUEST) and set this
bit in a valid TDX guest platform.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20220405232939.73860-2-kirill.shutemov@linux.intel.com


# dbae0a93 26-Jan-2022 Borislav Petkov <bp@suse.de>

x86/cpu: Remove CONFIG_X86_SMAP and "nosmap"

Those were added as part of the SMAP enablement but SMAP is currently
an integral part of kernel proper and there's no need to disable it
anymore.

Rip out that functionality. Leave --uaccess default on for objtool as
this is what objtool should do by default anyway.

If still needed - clearcpuid=smap.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220127115626.14179-4-bp@alien8.de


# 4cdfc11b 17-Apr-2022 Nur Hussein <hussein@unixcat.org>

x86/Kconfig: fix the spelling of 'becoming' in X86_KERNEL_IBT config

There is only one m in becoming.

Signed-off-by: Nur Hussein <hussein@unixcat.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220417192454.10247-1-hussein@unixcat.org


# 7dd5ad2d 31-Mar-2022 Thomas Gleixner <tglx@linutronix.de>

Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels"

Revert commit bf9ad37dc8a. It needs to be better encapsulated and
generalized.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>


# f3a112c0 25-Mar-2022 Masami Hiramatsu <mhiramat@kernel.org>

x86,rethook,kprobes: Replace kretprobe with rethook on x86

Replaces the kretprobe code with rethook on x86. With this patch,
kretprobe on x86 uses the rethook instead of kretprobe specific
trampoline code.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/164826163692.2455864.13745421016848209527.stgit@devnote2


# f6a2c2b2 18-Mar-2022 Nathan Chancellor <nathan@kernel.org>

x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0

With CONFIG_X86_KERNEL_IBT=y and a version of ld.lld prior to 14.0.0,
there are numerous objtool warnings along the lines of:

warning: objtool: .plt+0x6: indirect jump found in RETPOLINE build

This is a known issue that has been resolved in ld.lld 14.0.0. Prevent
CONFIG_X86_KERNEL_IBT from being selectable when using one of these
problematic ld.lld versions.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220318230747.3900772-3-nathan@kernel.org


# 262448f3 18-Mar-2022 Nathan Chancellor <nathan@kernel.org>

x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0

Commit 156ff4a544ae ("x86/ibt: Base IBT bits") added a check for a crash
with 'clang -fcf-protection=branch -mfentry -pg', which intended to
exclude Clang versions older than 14.0.0 from selecting
CONFIG_X86_KERNEL_IBT.

clang-11 does not have the issue that the check is testing for, so
CONFIG_X86_KERNEL_IBT is selectable. Unfortunately, there is a different
crash in clang-11 that was fixed in clang-12. To make matters worse,
that crash does not appear to be entirely deterministic, as the same
input to the compiler will sometimes crash and other times not, which
makes dynamically checking for the crash like the '-pg' one unreliable.

To make everything work properly for all common versions of clang, use a
hard version check of 14.0.0, as that will be the first release upstream
that has both bugs properly fixed.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220318230747.3900772-2-nathan@kernel.org


# aaeed6ec 14-Mar-2022 Nathan Chancellor <nathan@kernel.org>

x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy

There are two outstanding issues with CONFIG_X86_X32_ABI and
llvm-objcopy, with similar root causes:

1. llvm-objcopy does not properly convert .note.gnu.property when going
from x86_64 to x86_x32, resulting in a corrupted section when
linking:

https://github.com/ClangBuiltLinux/linux/issues/1141

2. llvm-objcopy produces corrupted compressed debug sections when going
from x86_64 to x86_x32, also resulting in an error when linking:

https://github.com/ClangBuiltLinux/linux/issues/514

After commit 41c5ef31ad71 ("x86/ibt: Base IBT bits"), the
.note.gnu.property section is always generated when
CONFIG_X86_KERNEL_IBT is enabled, which causes the first issue to become
visible with an allmodconfig build:

ld.lld: error: arch/x86/entry/vdso/vclock_gettime-x32.o:(.note.gnu.property+0x1c): program property is too short

To avoid this error, do not allow CONFIG_X86_X32_ABI to be selected when
using llvm-objcopy. If the two issues ever get fixed in llvm-objcopy,
this can be turned into a feature check.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220314194842.3452-3-nathan@kernel.org


# 83a44a4f 14-Mar-2022 Masahiro Yamada <masahiroy@kernel.org>

x86: Remove toolchain check for X32 ABI capability

Commit 0bf6276392e9 ("x32: Warn and disable rather than error if
binutils too old") added a small test in arch/x86/Makefile because
binutils 2.22 or newer is needed to properly support elf32-x86-64. This
check is no longer necessary, as the minimum supported version of
binutils is 2.23, which is enforced at configuration time with
scripts/min-tool-version.sh.

Remove this check and replace all uses of CONFIG_X86_X32 with
CONFIG_X86_X32_ABI, as two symbols are no longer necessary.

[nathan: Rebase, fix up a few places where CONFIG_X86_X32 was still
used, and simplify commit message to satisfy -tip requirements]

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220314194842.3452-2-nathan@kernel.org


# ed53a0d9 08-Mar-2022 Peter Zijlstra <peterz@infradead.org>

x86/alternative: Use .ibt_endbr_seal to seal indirect calls

Objtool's --ibt option generates .ibt_endbr_seal which lists
superfluous ENDBR instructions. That is those instructions for which
the function is never indirectly called.

Overwrite these ENDBR instructions with a NOP4 such that these
function can never be indirect called, reducing the number of viable
ENDBR targets in the kernel.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154319.822545231@infradead.org


# 156ff4a5 08-Mar-2022 Peter Zijlstra <peterz@infradead.org>

x86/ibt: Base IBT bits

Add Kconfig, Makefile and basic instruction support for x86 IBT.

(Ab)use __DISABLE_EXPORTS to disable IBT since it's already employed
to mark compressed and purgatory. Additionally mark realmode with it
as well to avoid inserting ENDBR instructions there. While ENDBR is
technically a NOP, inserting them was causing some grief due to code
growth. There's also a problem with using __noendbr in code compiled
without -fcf-protection=branch.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154317.519875203@infradead.org


# 24e988c7 24-Mar-2022 Anshuman Khandual <anshuman.khandual@arm.com>

mm: generalize ARCH_HAS_FILTER_PGPROT

ARCH_HAS_FILTER_PGPROT config has duplicate definitions on platforms that
subscribe it. Instead make it a generic config option which can be
selected on applicable platforms when required.

Link: https://lkml.kernel.org/r/1643004823-16441-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4e8ca134 21-Mar-2022 Alexei Starovoitov <ast@kernel.org>

Revert "rethook: x86: Add rethook x86 implementation"

This reverts commit 75caf33eda242e2f34f61e475d666359749ae5ff.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 75caf33e 15-Mar-2022 Masami Hiramatsu <mhiramat@kernel.org>

rethook: x86: Add rethook x86 implementation

Add rethook for x86 implementation. Most of the code has been copied from
kretprobes on x86.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164735286243.1084943.7477055110527046644.stgit@devnote2


# eed1fcee 02-Mar-2022 Song Liu <song@kernel.org>

x86: Disable HAVE_ARCH_HUGE_VMALLOC on 32-bit x86

kernel test robot reported kernel BUG like:

[ 44.587744][ T1] kernel BUG at arch/x86/mm/physaddr.c:76!
[ 44.590151][ T1] __vmalloc_area_node (mm/vmalloc.c:622 mm/vmalloc.c:2995)
[ 44.590151][ T1] __vmalloc_node_range (mm/vmalloc.c:3108)
[ 44.590151][ T1] __vmalloc_node (mm/vmalloc.c:3157)

which is triggered with HAVE_ARCH_HUGE_VMALLOC on 32-bit x86. Since BPF
only uses HAVE_ARCH_HUGE_VMALLOC for x86_64, turn it off for 32-bit x86.

Fixes: fac54e2bfb5b ("x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220302175126.247459-2-song@kernel.org


# fac54e2b 04-Feb-2022 Song Liu <songliubraving@fb.com>

x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP

This enables module_alloc() to allocate huge page for 2MB+ requests.
To check the difference of this change, we need enable config
CONFIG_PTDUMP_DEBUGFS, and call module_alloc(2MB). Before the change,
/sys/kernel/debug/page_tables/kernel shows pte for this map. With the
change, /sys/kernel/debug/page_tables/ show pmd for thie map.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220204185742.271030-2-song@kernel.org


# 07431506 22-Mar-2022 Anshuman Khandual <anshuman.khandual@arm.com>

mm/hugetlb: generalize ARCH_WANT_GENERAL_HUGETLB

ARCH_WANT_GENERAL_HUGETLB config has duplicate definitions on platforms
that subscribe it. Instead make it a generic config option which can be
selected on applicable platforms when required.

Link: https://lkml.kernel.org/r/1643718465-4324-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 99cf983c 14-Feb-2022 Mark Rutland <mark.rutland@arm.com>

sched/preempt: Add PREEMPT_DYNAMIC using static keys

Where an architecture selects HAVE_STATIC_CALL but not
HAVE_STATIC_CALL_INLINE, each static call has an out-of-line trampoline
which will either branch to a callee or return to the caller.

On such architectures, a number of constraints can conspire to make
those trampolines more complicated and potentially less useful than we'd
like. For example:

* Hardware and software control flow integrity schemes can require the
addition of "landing pad" instructions (e.g. `BTI` for arm64), which
will also be present at the "real" callee.

* Limited branch ranges can require that trampolines generate or load an
address into a register and perform an indirect branch (or at least
have a slow path that does so). This loses some of the benefits of
having a direct branch.

* Interaction with SW CFI schemes can be complicated and fragile, e.g.
requiring that we can recognise idiomatic codegen and remove
indirections understand, at least until clang proves more helpful
mechanisms for dealing with this.

For PREEMPT_DYNAMIC, we don't need the full power of static calls, as we
really only need to enable/disable specific preemption functions. We can
achieve the same effect without a number of the pain points above by
using static keys to fold early returns into the preemption functions
themselves rather than in an out-of-line trampoline, effectively
inlining the trampoline into the start of the function.

For arm64, this results in good code generation. For example, the
dynamic_cond_resched() wrapper looks as follows when enabled. When
disabled, the first `B` is replaced with a `NOP`, resulting in an early
return.

| <dynamic_cond_resched>:
| bti c
| b <dynamic_cond_resched+0x10> // or `nop`
| mov w0, #0x0
| ret
| mrs x0, sp_el0
| ldr x0, [x0, #8]
| cbnz x0, <dynamic_cond_resched+0x8>
| paciasp
| stp x29, x30, [sp, #-16]!
| mov x29, sp
| bl <preempt_schedule_common>
| mov w0, #0x1
| ldp x29, x30, [sp], #16
| autiasp
| ret

... compared to the regular form of the function:

| <__cond_resched>:
| bti c
| mrs x0, sp_el0
| ldr x1, [x0, #8]
| cbz x1, <__cond_resched+0x18>
| mov w0, #0x0
| ret
| paciasp
| stp x29, x30, [sp, #-16]!
| mov x29, sp
| bl <preempt_schedule_common>
| mov w0, #0x1
| ldp x29, x30, [sp], #16
| autiasp
| ret

Any architecture which implements static keys should be able to use this
to implement PREEMPT_DYNAMIC with similar cost to non-inlined static
calls. Since this is likely to have greater overhead than (inlined)
static calls, PREEMPT_DYNAMIC is only defaulted to enabled when
HAVE_PREEMPT_DYNAMIC_CALL is selected.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-6-mark.rutland@arm.com


# bf9ad37d 14-Jul-2015 Oleg Nesterov <oleg@redhat.com>

signal, x86: Delay calling signals in atomic on RT enabled kernels

On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.

When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and
one of these is the spin lock used in signal handling.

Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spinlock_t lock that has been converted to a
sleeping lock. If this happens, the above issues with the corrupted
stack is possible.

Instead of calling the signal right away, for PREEMPT_RT and x86,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.

[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
[bigeasy: Add on 32bit as per Yang Shi, minor rewording. ]
[ tglx: Use a config option ]

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/Ygq5aBB/qMQw6aP5@linutronix.de


# a7a6f65a 12-Feb-2022 Mateusz Jończyk <mat.jonczyk@o2.pl>

x86/Kconfig: move and modify CONFIG_I8K

In Kconfig, inside the "Processor type and features" menu, there is
the CONFIG_I8K option: "Dell i8k legacy laptop support". This is
very confusing - enabling CONFIG_I8K is not required for the kernel to
support old Dell laptops. This option is specific to the dell-smm-hwmon
driver, which mostly exports some hardware monitoring information and
allows the user to change fan speed.

This option is misplaced, so move CONFIG_I8K to drivers/hwmon/Kconfig,
where it belongs.

Also, modify the dependency order - change
select SENSORS_DELL_SMM
to
depends on SENSORS_DELL_SMM
as it is just a configuration option of dell-smm-hwmon. This includes
changing the option type from tristate to bool. It was tristate because
it could select CONFIG_SENSORS_DELL_SMM=m .

When running "make oldconfig" on configurations with
CONFIG_SENSORS_DELL_SMM enabled , this change will result in an
additional question (which could be printed several times during
bisecting). I think that tidying up the configuration is worth it,
though.

Next patch tweaks the description of CONFIG_I8K.

Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Cc: Pali Rohár <pali@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Mark Gross <markgross@kernel.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220212125654.357408-1-mat.jonczyk@o2.pl
Signed-off-by: Guenter Roeck <linux@roeck-us.net>


# 2792d84e 16-Feb-2022 Kees Cook <keescook@chromium.org>

usercopy: Check valid lifetime via stack depth

One of the things that CONFIG_HARDENED_USERCOPY sanity-checks is whether
an object that is about to be copied to/from userspace is overlapping
the stack at all. If it is, it performs a number of inexpensive
bounds checks. One of the finer-grained checks is whether an object
crosses stack frames within the stack region. Doing this on x86 with
CONFIG_FRAME_POINTER was cheap/easy. Doing it with ORC was deemed too
heavy, and was left out (a while ago), leaving the courser whole-stack
check.

The LKDTM tests USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM
try to exercise these cross-frame cases to validate the defense is
working. They have been failing ever since ORC was added (which was
expected). While Muhammad was investigating various LKDTM failures[1],
he asked me for additional details on them, and I realized that when
exact stack frame boundary checking is not available (i.e. everything
except x86 with FRAME_POINTER), it could check if a stack object is at
least "current depth valid", in the sense that any object within the
stack region but not between start-of-stack and current_stack_pointer
should be considered unavailable (i.e. its lifetime is from a call no
longer present on the stack).

Introduce ARCH_HAS_CURRENT_STACK_POINTER to track which architectures
have actually implemented the common global register alias.

Additionally report usercopy bounds checking failures with an offset
from current_stack_pointer, which may assist with diagnosing failures.

The LKDTM USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM tests
(once slightly adjusted in a separate patch) pass again with this fixed.

[1] https://github.com/kernelci/kernelci-project/issues/84

Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v1: https://lore.kernel.org/lkml/20220216201449.2087956-1-keescook@chromium.org
v2: https://lore.kernel.org/lkml/20220224060342.1855457-1-keescook@chromium.org
v3: https://lore.kernel.org/lkml/20220225173345.3358109-1-keescook@chromium.org
v4: - improve commit log (akpm)


# 4eda2bc3 29-Sep-2021 David Hildenbrand <david@redhat.com>

x86/Kconfig: Select ARCH_SELECT_MEMORY_MODEL only if FLATMEM and SPARSEMEM are possible

x86-64 supports only CONFIG_SPARSEMEM; there is nothing users can select.
So enable the memory model selection (via CONFIG_ARCH_SELECT_MEMORY_MODEL)
only if both, SPARSEMEM and FLATMEM are possible, which isn't the case
on x86-64.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20210929144321.50411-1-david@redhat.com


# 4ed308c4 25-Jan-2022 Steven Rostedt (Google) <rostedt@goodmis.org>

ftrace: Have architectures opt-in for mcount build time sorting

First S390 complained that the sorting of the mcount sections at build
time caused the kernel to crash on their architecture. Now PowerPC is
complaining about it too. And also ARM64 appears to be having issues.

It may be necessary to also update the relocation table for the values
in the mcount table. Not only do we have to sort the table, but also
update the relocations that may be applied to the items in the table.

If the system is not relocatable, then it is fine to sort, but if it is,
some architectures may have issues (although x86 does not as it shifts all
addresses the same).

Add a HAVE_BUILDTIME_MCOUNT_SORT that an architecture can set to say it is
safe to do the sorting at build time.

Also update the config to compile in build time sorting in the sorttable
code in scripts/ to depend on CONFIG_BUILDTIME_MCOUNT_SORT.

Link: https://lore.kernel.org/all/944D10DA-8200-4BA9-8D0A-3BED9AA99F82@linux.ibm.com/
Link: https://lkml.kernel.org/r/20220127153821.3bc1ac6e@gandalf.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Yinan Liu <yinan@linux.alibaba.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Fixes: 72b3942a173c ("scripts: ftrace - move the sort-processing in ftrace_init")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# c126a53c 14-Aug-2021 Yury Norov <yury.norov@gmail.com>

arch: remove GENERIC_FIND_FIRST_BIT entirely

In 5.12 cycle we enabled GENERIC_FIND_FIRST_BIT config option for ARM64
and MIPS. It increased performance and shrunk .text size; and so far
I didn't receive any negative feedback on the change.

https://lore.kernel.org/linux-arch/20210225135700.1381396-1-yury.norov@gmail.com/

Now I think it's a good time to switch all architectures to use
find_{first,last}_bit() unconditionally, and so remove corresponding
config option.

The patch does't introduce functioal changes for arc, arm, arm64, mips,
m68k, s390 and x86, for other architectures I expect improvement both in
performance and .text size.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Alexander Lobakin <alobakin@pm.me> (mips)
Reviewed-by: Alexander Lobakin <alobakin@pm.me> (mips)
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Will Deacon <will@kernel.org>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>


# bece04b5 19-Jan-2022 Marco Elver <elver@google.com>

kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR

Until recent versions of GCC and Clang, it was not possible to disable
KCOV instrumentation via a function attribute. The relevant function
attribute was introduced in 540540d06e9d9 ("kcov: add
__no_sanitize_coverage to fix noinstr for all architectures").

x86 was the first architecture to want a working noinstr, and at the
time no compiler support for the attribute existed yet. Therefore,
commit 0f1441b44e823 ("objtool: Fix noinstr vs KCOV") introduced the
ability to NOP __sanitizer_cov_*() calls in .noinstr.text.

However, this doesn't work for other architectures like arm64 and s390
that want a working noinstr per ARCH_WANTS_NO_INSTR.

At the time of 0f1441b44e823, we didn't yet have ARCH_WANTS_NO_INSTR,
but now we can move the Kconfig dependency checks to the generic KCOV
option. KCOV will be available if:

- architecture does not care about noinstr, OR
- we have objtool support (like on x86), OR
- GCC is 12.0 or newer, OR
- Clang is 13.0 or newer.

Link: https://lkml.kernel.org/r/20211201152604.3984495-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7ecd19cf 19-Jan-2022 Kefeng Wang <wangkefeng.wang@huawei.com>

mm: percpu: generalize percpu related config

Patch series "mm: percpu: Cleanup percpu first chunk function".

When supporting page mapping percpu first chunk allocator on arm64, we
found there are lots of duplicated codes in percpu embed/page first chunk
allocator. This patchset is aimed to cleanup them and should no function
change.

The currently supported status about 'embed' and 'page' in Archs shows
below,

embed: NEED_PER_CPU_PAGE_FIRST_CHUNK
page: NEED_PER_CPU_EMBED_FIRST_CHUNK

embed page
------------------------
arm64 Y Y
mips Y N
powerpc Y Y
riscv Y N
sparc Y Y
x86 Y Y
------------------------

There are two interfaces about percpu first chunk allocator,

extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
size_t atom_size,
pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);

extern int __init pcpu_page_first_chunk(size_t reserved_size,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn,
- pcpu_fc_populate_pte_fn_t populate_pte_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);

The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic
pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the
pcpu_embed/page_first_chunk().

1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be
provided when archs supported NUMA.

2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too,
a generic pcpu_populate_pte() which marked '__weak' is provided, if you
need a different function to populate pte on the arch(like x86), please
provide its own implementation.

[1] https://github.com/kevin78/linux.git percpu-cleanup

This patch (of 4):

The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/
NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have
duplicate definitions on platforms that subscribe it.

Move them into mm, drop these redundant definitions and instead just
select it on applicable platforms.

Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d283d422 14-Jan-2022 Pasha Tatashin <pasha.tatashin@soleen.com>

x86: mm: add x86_64 support for page table check

Add page table check hooks into routines that modify user page tables.

Link: https://lkml.kernel.org/r/20211221154650.1047963-5-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c6dbd3e5 15-Nov-2021 Peter Zijlstra <peterz@infradead.org>

x86/mmx_32: Remove X86_USE_3DNOW

This code puts an exception table entry on the PREFETCH instruction to
overwrite it with a JMP.d8 when it triggers an exception. Except of
course, our code is no longer writable, also SMP.

Instead of fixing this broken mess, simply take it out.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/YZKQzUmeNuwyvZpk@hirez.programming.kicks-ass.net


# e463a09a 04-Dec-2021 Peter Zijlstra <peterz@infradead.org>

x86: Add straight-line-speculation mitigation

Make use of an upcoming GCC feature to mitigate
straight-line-speculation for x86:

https://gcc.gnu.org/g:53a643f8568067d7700a9f2facc8ba39974973d3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952
https://bugs.llvm.org/show_bug.cgi?id=52323

It's built tested on x86_64-allyesconfig using GCC-12 and GCC-11.

Maintenance overhead of this should be fairly low due to objtool
validation.

Size overhead of all these additional int3 instructions comes to:

text data bss dec hex filename
22267751 6933356 2011368 31212475 1dc43bb defconfig-build/vmlinux
22804126 6933356 1470696 31208178 1dc32f2 defconfig-build/vmlinux.sls

Or roughly 2.4% additional text.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211204134908.140103474@infradead.org


# 50468e43 16-Nov-2021 Jarkko Sakkinen <jarkko@kernel.org>

x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node

== Problem ==

The amount of SGX memory on a system is determined by the BIOS and it
varies wildly between systems. It can be as small as dozens of MB's
and as large as many GB's on servers. Just like how applications need
to know how much regular RAM is available, enclave builders need to
know how much SGX memory an enclave can consume.

== Solution ==

Introduce a new sysfs file:

/sys/devices/system/node/nodeX/x86/sgx_total_bytes

to enumerate the amount of SGX memory available in each NUMA node.
This serves the same function for SGX as /proc/meminfo or
/sys/devices/system/node/nodeX/meminfo does for normal RAM.

'sgx_total_bytes' is needed today to help drive the SGX selftests.
SGX-specific swap code is exercised by creating overcommitted enclaves
which are larger than the physical SGX memory on the system. They
currently use a CPUID-based approach which can diverge from the actual
amount of SGX memory available. 'sgx_total_bytes' ensures that the
selftests can work efficiently and do not attempt stupid things like
creating a 100,000 MB enclave on a system with 128 MB of SGX memory.

== Implementation Details ==

Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an
arch specific attribute group, and add an attribute for the amount of
SGX memory in bytes to each NUMA node:

== ABI Design Discussion ==

As opposed to the per-node ABI, a single, global ABI was considered.
However, this would prevent enclaves from being able to size
themselves so that they fit on a single NUMA node. Essentially, a
single value would rule out NUMA optimizations for enclaves.

Create a new "x86/" directory inside each "nodeX/" sysfs directory.
'sgx_total_bytes' is expected to be the first of at least a few
sgx-specific files to be placed in the new directory. Just scanning
/proc/meminfo, these are the no-brainers that we have for RAM, but we
need for SGX:

MemTotal: xxxx kB // sgx_total_bytes (implemented here)
MemFree: yyyy kB // sgx_free_bytes
SwapTotal: zzzz kB // sgx_swapped_bytes

So, at *least* three. I think we will eventually end up needing
something more along the lines of a dozen. A new directory (as
opposed to being in the nodeX/ "root") directory avoids cluttering the
root with several "sgx_*" files.

Place the new file in a new "nodeX/x86/" directory because SGX is
highly x86-specific. It is very unlikely that any other architecture
(or even non-Intel x86 vendor) will ever implement SGX. Using "sgx/"
as opposed to "x86/" was also considered. But, there is a real chance
this can get used for other arch-specific purposes.

[ dhansen: rewrite changelog ]

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org


# 20f07a04 06-Dec-2021 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/sev: Move common memory encryption code to mem_encrypt.c

SEV and TDX both protect guest memory from host accesses. They both use
guest physical address bits to communicate to the hardware which pages
receive protection or not. SEV and TDX both assume that all I/O (real
devices and virtio) must be performed to pages *without* protection.

To add this support, AMD SEV code forces force_dma_unencrypted() to
decrypt DMA pages when DMA pages were allocated for I/O. It also uses
swiotlb_update_mem_attributes() to update decryption bits in SWIOTLB DMA
buffers.

Since TDX also uses a similar memory sharing design, all the above
mentioned changes can be reused. So move force_dma_unencrypted(),
SWIOTLB update code and virtio changes out of mem_encrypt_amd.c to
mem_encrypt.c.

Introduce a new config option X86_MEM_ENCRYPT that can be selected by
platforms which use x86 memory encryption features (needed in both AMD
SEV and Intel TDX guest platforms).

Since the code is moved from mem_encrypt_amd.c, inherit the same make
flags.

This is preparation for enabling TDX memory encryption support and it
has no functional changes.

Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20211206135505.75045-4-kirill.shutemov@linux.intel.com


# 40e0e784 26-Oct-2021 Tony Luck <tony.luck@intel.com>

x86/sgx: Add infrastructure to identify SGX EPC pages

X86 machine check architecture reports a physical address when there
is a memory error. Handling that error requires a method to determine
whether the physical address reported is in any of the areas reserved
for EPC pages by BIOS.

SGX EPC pages do not have Linux "struct page" associated with them.

Keep track of the mapping from ranges of EPC pages to the sections
that contain them using an xarray. N.B. adds CONFIG_XARRAY_MULTI to
the SGX dependecies. So "select" that in arch/x86/Kconfig for X86/SGX.

Create a function arch_is_platform_page() that simply reports whether an
address is an EPC page for use elsewhere in the kernel. The ACPI error
injection code needs this function and is typically built as a module,
so export it.

Note that arch_is_platform_page() will be slower than other similar
"what type is this page" functions that can simply check bits in the
"struct page". If there is some future performance critical user of
this function it may need to be implemented in a more efficient way.

Note also that the current implementation of xarray allocates a few
hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
configured. This isn't ideal, but worth it for the code simplicity.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-3-tony.luck@intel.com


# 1ff2fc02 20-Oct-2021 Tom Lendacky <thomas.lendacky@amd.com>

x86/sme: Explicitly map new EFI memmap table as encrypted

Reserving memory using efi_mem_reserve() calls into the x86
efi_arch_mem_reserve() function. This function will insert a new EFI
memory descriptor into the EFI memory map representing the area of
memory to be reserved and marking it as EFI runtime memory. As part
of adding this new entry, a new EFI memory map is allocated and mapped.
The mapping is where a problem can occur. This new memory map is mapped
using early_memremap() and generally mapped encrypted, unless the new
memory for the mapping happens to come from an area of memory that is
marked as EFI_BOOT_SERVICES_DATA memory. In this case, the new memory will
be mapped unencrypted. However, during replacement of the old memory map,
efi_mem_type() is disabled, so the new memory map will now be long-term
mapped encrypted (in efi.memmap), resulting in the map containing invalid
data and causing the kernel boot to crash.

Since it is known that the area will be mapped encrypted going forward,
explicitly map the new memory map as encrypted using early_memremap_prot().

Cc: <stable@vger.kernel.org> # 4.14.x
Fixes: 8f716c9b5feb ("x86/mm: Add support to access boot related data in the clear")
Link: https://lore.kernel.org/all/ebf1eb2940405438a09d51d121ec0d02c8755558.1634752931.git.thomas.lendacky@amd.com/
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ardb: incorporate Kconfig fix by Arnd]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# 503e4510 15-Nov-2021 Heiko Carstens <hca@linux.ibm.com>

ftrace/samples: add missing Kconfig option for ftrace direct multi sample

Currently it is not possible to build the ftrace direct multi example
anymore due to broken config dependencies. Fix this by adding
SAMPLE_FTRACE_DIRECT_MULTI config option.

This broke when merging s390-5.16-1 due to an incorrect merge conflict
resolution proposed by me.

Also rename SAMPLE_FTRACE_MULTI_DIRECT to SAMPLE_FTRACE_DIRECT_MULTI
so it matches the module name.

Fixes: 0b707e572a19 ("Merge tag 's390-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux")
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20211115195614.3173346-2-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# 5c11f00b 05-Nov-2021 David Hildenbrand <david@redhat.com>

x86: remove memory hotplug support on X86_32

CONFIG_MEMORY_HOTPLUG was marked BROKEN over one year and we just
restricted it to 64 bit. Let's remove the unused x86 32bit
implementation and simplify the Kconfig.

Link: https://lkml.kernel.org/r/20210929143600.49379-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c316eb44 12-Oct-2021 Heiko Carstens <hca@linux.ibm.com>

samples: add HAVE_SAMPLE_FTRACE_DIRECT config option

Add HAVE_SAMPLE_FTRACE_DIRECT config option which can be selected by
architectures which have support for ftrace direct call samples.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20211012133802.2460757-4-hca@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>


# 1f6d3a8f 25-Oct-2021 Masami Hiramatsu <mhiramat@kernel.org>

kprobes: Add a test case for stacktrace from kretprobe handler

Add a test case for stacktrace from kretprobe handler and
nested kretprobe handlers.

This test checks both of stack trace inside kretprobe handler
and stack trace from pt_regs. Those stack trace must include
actual function return address instead of kretprobe trampoline.
The nested kretprobe stacktrace test checks whether the unwinder
can correctly unwind the call frame on the stack which has been
modified by the kretprobe.

Since the stacktrace on kretprobe is correctly fixed only on x86,
this introduces a meta kconfig ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
which tells user that the stacktrace on kretprobe is correct or not.

The test results will be shown like below;

TAP version 14
1..1
# Subtest: kprobes_test
1..6
ok 1 - test_kprobe
ok 2 - test_kprobes
ok 3 - test_kretprobe
ok 4 - test_kretprobes
ok 5 - test_stacktrace_on_kretprobe
ok 6 - test_stacktrace_on_nested_kretprobe
# kprobes_test: pass:6 fail:0 skip:0 total:6
# Totals: pass:6 fail:0 skip:0 total:6
ok 1 - kprobes_test

Link: https://lkml.kernel.org/r/163516211244.604541.18350507860972214415.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 3aac3ebe 21-Oct-2021 Thomas Gleixner <tglx@linutronix.de>

x86/signal: Implement sigaltstack size validation

For historical reasons MINSIGSTKSZ is a constant which became already too
small with AVX512 support.

Add a mechanism to enforce strict checking of the sigaltstack size against
the real size of the FPU frame.

The strict check can be enabled via a config option and can also be
controlled via the kernel command line option 'strict_sas_size' independent
of the config switch.

Enabling it might break existing applications which allocate a too small
sigaltstack but 'work' because they never get a signal delivered. Though it
can be handy to filter out binaries which are not yet aware of
AT_MINSIGSTKSZ.

Also the upcoming support for dynamically enabled FPU features requires a
strict sanity check to ensure that:

- Enabling of a dynamic feature, which changes the sigframe size fits
into an enabled sigaltstack

- Installing a too small sigaltstack after a dynamic feature has been
added is not possible.

Implement the base check which is controlled by config and command line
options.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211021225527.10184-3-chang.seok.bae@intel.com


# 4a30e4c9 20-Oct-2021 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace/x86_64: Have function graph tracer depend on DYNAMIC_FTRACE

The function graph tracer is going to now depend on
ARCH_SUPPORTS_FTRACE_OPS, as that also means that it can support ftrace
args. Since ARCH_SUPPORTS_FTRACE_OPS depends on DYNAMIC_FTRACE, this
means that the function graph tracer for x86_64 will need to depend on
DYNAMIC_FTRACE.

Link: https://lkml.kernel.org/r/20211020233555.16b0dbf2@rorschach.local.home

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 66558b73 24-Sep-2021 Tim Chen <tim.c.chen@linux.intel.com>

sched: Add cluster scheduler level for x86

There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce is
shared among a cluster of cores instead of being exclusive to one
single core.

To prevent oversubscription of L2 cache, load should be balanced
between such L2 clusters, especially for tasks with no shared data.
On benchmark such as SPECrate mcf test, this change provides a boost
to performance especially on medium load system on Jacobsville. on a
Jacobsville that has 24 Atom cores, arranged into 6 clusters of 4
cores each, the benchmark number is as follow:

Improvement over baseline kernel for mcf_r
copies run time base rate
1 -0.1% -0.2%
6 25.1% 25.1%
12 18.8% 19.0%
24 0.3% 0.3%

So this looks pretty good. In terms of the system's task distribution,
some pretty bad clumping can be seen for the vanilla kernel without
the L2 cluster domain for the 6 and 12 copies case. With the extra
domain for cluster, the load does get evened out between the clusters.

Note this patch isn't an universal win as spreading isn't necessarily
a win, particually for those workload who can benefit from packing.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210924085104.44806-4-21cnbao@gmail.com


# 3fd3590b 03-Aug-2021 Lukas Bulwahn <lukas.bulwahn@gmail.com>

x86/Kconfig: Remove references to obsolete Kconfig symbols

Remove two symbols referenced in Kconfig which have been removed
previously by:

ef3c67b6454b ("mfd: intel_msic: Remove driver for deprecated platform")
1b79fc4f2bfd ("x86/apb_timer: Remove driver for deprecated platform")

Detected by scripts/checkkconfigsymbols.py.

[ bp: Merge into a single patch. ]

Suggested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210803113531.30720-5-lukas.bulwahn@gmail.com


# aa5a4611 08-Sep-2021 Tom Lendacky <thomas.lendacky@amd.com>

x86/sev: Add an x86 version of cc_platform_has()

Introduce an x86 version of the cc_platform_has() function. This will be
used to replace vendor specific calls like sme_active(), sev_active(),
etc.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210928191009.32551-4-bp@alien8.de


# ef775a0e 10-Sep-2021 Randy Dunlap <rdunlap@infradead.org>

x86/Kconfig: Fix an unused variable error in dell-smm-hwmon

When CONFIG_PROC_FS is not set, there is a build warning (turned
into an error):

../drivers/hwmon/dell-smm-hwmon.c: In function 'i8k_init_procfs':
../drivers/hwmon/dell-smm-hwmon.c:624:24: error: unused variable 'data' [-Werror=unused-variable]
struct dell_smm_data *data = dev_get_drvdata(dev);

Make I8K depend on PROC_FS and HWMON (instead of selecting HWMON -- it
is strongly preferred to not select entire subsystems).

Build tested in all possible combinations of SENSORS_DELL_SMM, I8K, and
PROC_FS.

Fixes: 039ae58503f3 ("hwmon: Allow to compile dell-smm-hwmon driver without /proc/i8k")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Pali Rohár <pali@kernel.org>
Link: https://lkml.kernel.org/r/20210910071921.16777-1-rdunlap@infradead.org


# 71188590 06-Oct-2021 Borislav Petkov <bp@suse.de>

x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automatically

This Kconfig option was added initially so that memory encryption is
enabled by default on machines which support it.

However, devices which have DMA masks that are less than the bit
position of the encryption bit, aka C-bit, require the use of an IOMMU
or the use of SWIOTLB.

If the IOMMU is disabled or in passthrough mode, the kernel would switch
to SWIOTLB bounce-buffering for those transfers.

In order to avoid that,

2cc13bb4f59f ("iommu: Disable passthrough mode when SME is active")

disables the default IOMMU passthrough mode so that devices for which the
default 256K DMA is insufficient, can use the IOMMU instead.

However 2, there are cases where the IOMMU is disabled in the BIOS, etc.
(think the usual hardware folk "oops, I dropped the ball there" cases) or a
driver doesn't properly use the DMA APIs or a device has a firmware or
hardware bug, e.g.:

ea68573d408f ("drm/amdgpu: Fail to load on RAVEN if SME is active")

However 3, in the above GPU use case, there are APIs like Vulkan and
some OpenGL/OpenCL extensions which are under the assumption that
user-allocated memory can be passed in to the kernel driver and both the
GPU and CPU can do coherent and concurrent access to the same memory.
That cannot work with SWIOTLB bounce buffers, of course.

So, in order for those devices to function, drop the "default y" for the
SME by default active option so that users who want to have SME enabled,
will need to either enable it in their config or use "mem_encrypt=on" on
the kernel command line.

[ tlendacky: Generalize commit message. ]

Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/8bbacd0e-4580-3194-19d2-a0ecad7df09c@molgen.mpg.de


# 951cd3a0 28-Sep-2021 Arnd Bergmann <arnd@arndb.de>

firmware: include drivers/firmware/Kconfig unconditionally

Compile-testing drivers that require access to a firmware layer
fails when that firmware symbol is unavailable. This happened
twice this week:

- My proposed to change to rework the QCOM_SCM firmware symbol
broke on ppc64 and others.

- The cs_dsp firmware patch added device specific firmware loader
into drivers/firmware, which broke on the same set of
architectures.

We should probably do the same thing for other subsystems as well,
but fix this one first as this is a dependency for other patches
getting merged.

Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Charles Keepax <ckeepax@opensource.cirrus.com>
Cc: Simon Trimmer <simont@opensource.cirrus.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# 225bac2d 03-Aug-2021 Lukas Bulwahn <lukas.bulwahn@gmail.com>

x86/Kconfig: Correct reference to MWINCHIP3D

Commit in Fixes intended to exclude the Winchip series and referred to
CONFIG_WINCHIP3D, but the config symbol is called CONFIG_MWINCHIP3D.

Hence, scripts/checkkconfigsymbols.py warns:

WINCHIP3D
Referencing files: arch/x86/Kconfig

Correct the reference to the intended config symbol.

Fixes: 69b8d3fcabdc ("x86/Kconfig: Exclude i586-class CPUs lacking PAE support from the HIGHMEM64G Kconfig group")
Suggested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20210803113531.30720-4-lukas.bulwahn@gmail.com


# 794d5b8a 16-Sep-2021 Jan Beulich <jbeulich@suse.com>

swiotlb-xen: this is PV-only on x86

The code is unreachable for HVM or PVH, and it also makes little sense
in auto-translated environments. On Arm, with
xen_{create,destroy}_contiguous_region() both being stubs, I have a hard
time seeing what good the Xen specific variant does - the generic one
ought to be fine for all purposes there. Still Arm code explicitly
references symbols here, so the code will continue to be included there.

Instead of making PCI_XEN's "select" conditional, simply drop it -
SWIOTLB_XEN will be available unconditionally in the PV case anyway, and
is - as explained above - dead code in non-PV environments.

This in turn allows dropping the stubs for
xen_{create,destroy}_contiguous_region(), the former of which was broken
anyway - it failed to set the DMA handle output.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

Link: https://lore.kernel.org/r/5947b8ae-fdc7-225c-4838-84712265fc1e@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>


# d7109fe3 26-Aug-2021 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

x86/platform: Increase maximum GPIO number for X86_64

By default the 512 GPIOs is the maximum on any x86 platform.
With, for example, Intel Tiger Lake-H the SoC based controller
occupies up to 480 pins. This leaves only 32 available for
GPIO expanders or other drivers, like PMIC. Hence, bump the
maximum GPIO number to 1024 for X86_64 and leave 512 for X86_32.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20210826150317.29435-1-andriy.shevchenko@linux.intel.com


# 4aae683f 30-Jul-2021 Masahiro Yamada <masahiroy@kernel.org>

tracing: Refactor TRACE_IRQFLAGS_SUPPORT in Kconfig

Make architectures select TRACE_IRQFLAGS_SUPPORT instead of
having many defines.

Link: https://lkml.kernel.org/r/20210731052233.4703-2-masahiroy@kernel.org

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>   #arch/arc
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# e6226997 17-May-2021 Arnd Bergmann <arnd@arndb.de>

asm-generic: reverse GENERIC_{STRNCPY_FROM,STRNLEN}_USER symbols

Most architectures do not need a custom implementation, and in most
cases the generic implementation is preferred, so change the polariy
on these Kconfig symbols to require architectures to select them when
they provide their own version.

The new name is CONFIG_ARCH_HAS_{STRNCPY_FROM,STRNLEN}_USER.

The remaining architectures at the moment are: ia64, mips, parisc,
um and xtensa. We should probably convert these as well, but
I was not sure how far to take this series. Thomas Bogendoerfer
had some concerns about converting mips but may still do some
more detailed measurements to see which version is better.

Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: linux-xtensa@linux-xtensa.org
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# d391c582 25-Jun-2021 Javier Martinez Canillas <javierm@redhat.com>

drivers/firmware: move x86 Generic System Framebuffers support

The x86 architecture has generic support to register a system framebuffer
platform device. It either registers a "simple-framebuffer" if the config
option CONFIG_X86_SYSFB is enabled, or a legacy VGA/VBE/EFI FB device.

But the code is generic enough to be reused by other architectures and can
be moved out of the arch/x86 directory.

This will allow to also support the simple{fb,drm} drivers on non-x86 EFI
platforms, such as aarch64 where these drivers are only supported with DT.

Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210625130947.1803678-2-javierm@redhat.com


# b5f06f64 26-Apr-2021 Balbir Singh <sblbir@amazon.com>

x86/mm: Prepare for opt-in based L1D flush in switch_mm()

The goal of this is to allow tasks that want to protect sensitive
information, against e.g. the recently found snoop assisted data sampling
vulnerabilites, to flush their L1D on being switched out. This protects
their data from being snooped or leaked via side channels after the task
has context switched out.

This could also be used to wipe L1D when an untrusted task is switched in,
but that's not a really well defined scenario while the opt-in variant is
clearly defined.

The mechanism is default disabled and can be enabled on the kernel command
line.

Prepare for the actual prctl based opt-in:

1) Provide the necessary setup functionality similar to the other
mitigations and enable the static branch when the command line option
is set and the CPU provides support for hardware assisted L1D
flushing. Software based L1D flush is not supported because it's CPU
model specific and not really well defined.

This does not come with a sysfs file like the other mitigations
because it is not bound to any specific vulnerability.

Support has to be queried via the prctl(2) interface.

2) Add TIF_SPEC_L1D_FLUSH next to L1D_SPEC_IB so the two bits can be
mangled into the mm pointer in one go which allows to reuse the
existing mechanism in switch_mm() for the conditional IBPB speculation
barrier efficiently.

3) Add the L1D flush specific functionality which flushes L1D when the
outgoing task opted in.

Also check whether the incoming task has requested L1D flush and if so
validate that it is not accidentaly running on an SMT sibling as this
makes the whole excercise moot because SMT siblings share L1D which
opens tons of other attack vectors. If that happens schedule task work
which signals the incoming task on return to user/guest with SIGBUS as
this is part of the paranoid L1D flush contract.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210108121056.21940-1-sblbir@amazon.com


# 094121ef 28-Jul-2021 Lukas Bulwahn <lukas.bulwahn@gmail.com>

arch: Kconfig: clean up obsolete use of HAVE_IDE

The arch-specific Kconfig files use HAVE_IDE to indicate if IDE is
supported.

As IDE support and the HAVE_IDE config vanishes with commit b7fb14d3ac63
("ide: remove the legacy ide driver"), there is no need to mention
HAVE_IDE in all those arch-specific Kconfig files.

The issue was identified with ./scripts/checkkconfigsymbols.py.

Fixes: b7fb14d3ac63 ("ide: remove the legacy ide driver")
Suggested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20210728182115.4401-1-lukas.bulwahn@gmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 63703f37 30-Jun-2021 Kefeng Wang <wangkefeng.wang@huawei.com>

mm: generalize ZONE_[DMA|DMA32]

ZONE_[DMA|DMA32] configs have duplicate definitions on platforms that
subscribe to them. Instead, just make them generic options which can be
selected on applicable platforms.

Also only x86/arm64 architectures could enable both ZONE_DMA and
ZONE_DMA32 if EXPERT, add ARCH_HAS_ZONE_DMA_SET to make dma zone
configurable and visible on the two architectures.

Link: https://lkml.kernel.org/r/20210528074557.17768-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> [RISC-V]
Acked-by: Michal Simek <michal.simek@xilinx.com> [microblaze]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cebc774f 30-Jun-2021 Anshuman Khandual <anshuman.khandual@arm.com>

mm/thp: make ARCH_ENABLE_SPLIT_PMD_PTLOCK dependent on PGTABLE_LEVELS > 2

ARCH_ENABLE_SPLIT_PMD_PTLOCK is irrelevant unless there are more than two
page table levels including PMD (also per
Documentation/vm/split_page_table_lock.rst). Make this dependency
explicit on remaining platforms i.e x86 and s390 where
ARCH_ENABLE_SPLIT_PMD_PTLOCK is subscribed.

Link: https://lkml.kernel.org/r/1622013501-20409-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> # s390
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 51c2ee6d 21-Jun-2021 Nick Desaulniers <ndesaulniers@google.com>

Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTR

We don't want compiler instrumentation to touch noinstr functions,
which are annotated with the no_profile_instrument_function function
attribute. Add a Kconfig test for this and make GCOV depend on it, and
in the future, PGO.

If an architecture is using noinstr, it should denote that via this
Kconfig value. That makes Kconfigs that depend on noinstr able to express
dependencies in an architecturally agnostic way.

Cc: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/lkml/YMTn9yjuemKFLbws@hirez.programming.kicks-ass.net/
Link: https://lore.kernel.org/lkml/YMcssV%2Fn5IBGv4f0@hirez.programming.kicks-ass.net/
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210621231822.2848305-4-ndesaulniers@google.com


# 583bfd48 29-Apr-2021 Nathan Chancellor <nathan@kernel.org>

x86, lto: Enable Clang LTO for 32-bit as well

Commit b33fff07e3e3 ("x86, build: allow LTO to be selected") enabled
support for LTO for x86_64 but 32-bit works fine as well.

I tested the following config combinations:

* i386_defconfig + CONFIG_LTO_CLANG_FULL=y

* i386_defconfig + CONFIG_LTO_CLANG_THIN=y

* ARCH=i386 allmodconfig + CONFIG_LTO_CLANG_THIN=y

with LLVM 11.1.0, 12.0.0, and 13.0.0 from git without any build
failures. The defconfigs boot in QEMU with no new warnings.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210429232611.3966964-1-nathan@kernel.org


# a9ee6cf5 28-Jun-2021 Mike Rapoport <rppt@kernel.org>

mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA

After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA
configuration options are equivalent.

Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead.

Done with

$ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \
$(git grep -wl CONFIG_NEED_MULTIPLE_NODES)
$ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \
$(git grep -wl NEED_MULTIPLE_NODES)

with manual tweaks afterwards.

[rppt@linux.ibm.com: fix arm boot crash]
Link: https://lkml.kernel.org/r/YMj9vHhHOiCVN4BF@linux.ibm.com

Link: https://lkml.kernel.org/r/20210608091316.3622-9-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1a6a9044 01-Jun-2021 Mike Rapoport <rppt@kernel.org>

x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options

The CONFIG_X86_RESERVE_LOW build time and reservelow= command line option
allowed to control the amount of memory under 1M that would be reserved at
boot to avoid using memory that can be potentially clobbered by BIOS.

Since the entire range under 1M is always reserved there is no need for
these options anymore and they can be removed.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210601075354.5149-3-rppt@kernel.org


# 3c188518 25-May-2021 Mark Rutland <mark.rutland@arm.com>

locking/atomic: delete !ARCH_ATOMIC remnants

Now that all architectures implement ARCH_ATOMIC, we can make it
mandatory, removing the Kconfig symbol and logic for !ARCH_ATOMIC.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210525140232.53872-33-mark.rutland@arm.com


# 9be85de9 25-May-2021 Mark Rutland <mark.rutland@arm.com>

locking/atomic: make ARCH_ATOMIC a Kconfig symbol

Subsequent patches will move architectures over to the ARCH_ATOMIC API,
after preparing the asm-generic atomic implementations to function with
or without ARCH_ATOMIC.

As some architectures use the asm-generic implementations exclusively
(and don't have a local atomic.h), and to avoid the risk that
ARCH_ATOMIC isn't defined in some cases we expect, let's make the
ARCH_ATOMIC macro a Kconfig symbol instead, so that we can guarantee it
is consistently available where needed.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210525140232.53872-2-mark.rutland@arm.com


# f91ef222 04-May-2021 Oscar Salvador <osalvador@suse.de>

x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE

Enable x86_64 platform to use the MHP_MEMMAP_ON_MEMORY feature.

Link: https://lkml.kernel.org/r/20210421102701.25051-8-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 66f24fa7 04-May-2021 Anshuman Khandual <anshuman.khandual@arm.com>

mm: drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK

ARCH_ENABLE_SPLIT_PMD_PTLOCKS has duplicate definitions on platforms
that subscribe it. Drop these redundant definitions and instead just
select it on applicable platforms.

Link: https://lkml.kernel.org/r/1617259448-22529-6-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390]
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1e866974 04-May-2021 Anshuman Khandual <anshuman.khandual@arm.com>

mm: drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION

ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION configs have duplicate definitions on
platforms that subscribe them. Drop these reduntant definitions and
instead just select them appropriately.

[akpm@linux-foundation.org: s/x86_64/X86_64/, per Oscar]

Link: https://lkml.kernel.org/r/1617259448-22529-5-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 91024b3c 04-May-2021 Anshuman Khandual <anshuman.khandual@arm.com>

mm: generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE]

ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE] configs have duplicate
definitions on platforms that subscribe them. Instead, just make them
generic options which can be selected on applicable platforms.

Link: https://lkml.kernel.org/r/1617259448-22529-4-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390]
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c2280be8 04-May-2021 Anshuman Khandual <anshuman.khandual@arm.com>

mm: generalize ARCH_HAS_CACHE_LINE_SIZE

Patch series "mm: some config cleanups", v2.

This series contains config cleanup patches which reduces code
duplication across platforms and also improves maintainability. There
is no functional change intended with this series.

This patch (of 6):

ARCH_HAS_CACHE_LINE_SIZE config has duplicate definitions on platforms
that subscribe it. Instead, just make it a generic option which can be
selected on applicable platforms. This change reduces code duplication
and makes it cleaner.

Link: https://lkml.kernel.org/r/1617259448-22529-1-git-send-email-anshuman.khandual@arm.com
Link: https://lkml.kernel.org/r/1617259448-22529-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7677f7fd 04-May-2021 Axel Rasmussen <axelrasmussen@google.com>

userfaultfd: add minor fault registration mode

Patch series "userfaultfd: add minor fault handling", v9.

Overview
========

This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS.
When enabled (via the UFFDIO_API ioctl), this feature means that any
hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also*
get events for "minor" faults. By "minor" fault, I mean the following
situation:

Let there exist two mappings (i.e., VMAs) to the same page(s) (shared
memory). One of the mappings is registered with userfaultfd (in minor
mode), and the other is not. Via the non-UFFD mapping, the underlying
pages have already been allocated & filled with some contents. The UFFD
mapping has not yet been faulted in; when it is touched for the first
time, this results in what I'm calling a "minor" fault. As a concrete
example, when working with hugetlbfs, we have huge_pte_none(), but
find_lock_page() finds an existing page.

We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea
is, userspace resolves the fault by either a) doing nothing if the
contents are already correct, or b) updating the underlying contents using
the second, non-UFFD mapping (via memcpy/memset or similar, or something
fancier like RDMA, or etc...). In either case, userspace issues
UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are
correct, carry on setting up the mapping".

Use Case
========

Consider the use case of VM live migration (e.g. under QEMU/KVM):

1. While a VM is still running, we copy the contents of its memory to a
target machine. The pages are populated on the target by writing to the
non-UFFD mapping, using the setup described above. The VM is still running
(and therefore its memory is likely changing), so this may be repeated
several times, until we decide the target is "up to date enough".

2. We pause the VM on the source, and start executing on the target machine.
During this gap, the VM's user(s) will *see* a pause, so it is desirable to
minimize this window.

3. Between the last time any page was copied from the source to the target, and
when the VM was paused, the contents of that page may have changed - and
therefore the copy we have on the target machine is out of date. Although we
can keep track of which pages are out of date, for VMs with large amounts of
memory, it is "slow" to transfer this information to the target machine. We
want to resume execution before such a transfer would complete.

4. So, the guest begins executing on the target machine. The first time it
touches its memory (via the UFFD-registered mapping), userspace wants to
intercept this fault. Userspace checks whether or not the page is up to date,
and if not, copies the updated page from the source machine, via the non-UFFD
mapping. Finally, whether a copy was performed or not, userspace issues a
UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents
are correct, carry on setting up the mapping".

We don't have to do all of the final updates on-demand. The userfaultfd manager
can, in the background, also copy over updated pages once it receives the map of
which pages are up-to-date or not.

Interaction with Existing APIs
==============================

Because this is a feature, a registered VMA could potentially receive both
missing and minor faults. I spent some time thinking through how the
existing API interacts with the new feature:

UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not
allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault:

- For non-shared memory or shmem, -EINVAL is returned.
- For hugetlb, -EFAULT is returned.

UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults.
Without modifications, the existing codepath assumes a new page needs to
be allocated. This is okay, since userspace must have a second
non-UFFD-registered mapping anyway, thus there isn't much reason to want
to use these in any case (just memcpy or memset or similar).

- If UFFDIO_COPY is used on a minor fault, -EEXIST is returned.
- If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL
in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case).
- UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns
-ENOENT in that case (regardless of the kind of fault).

Future Work
===========

This series only supports hugetlbfs. I have a second series in flight to
support shmem as well, extending the functionality. This series is more
mature than the shmem support at this point, and the functionality works
fully on hugetlbfs, so this series can be merged first and then shmem
support will follow.

This patch (of 6):

This feature allows userspace to intercept "minor" faults. By "minor"
faults, I mean the following situation:

Let there exist two mappings (i.e., VMAs) to the same page(s). One of the
mappings is registered with userfaultfd (in minor mode), and the other is
not. Via the non-UFFD mapping, the underlying pages have already been
allocated & filled with some contents. The UFFD mapping has not yet been
faulted in; when it is touched for the first time, this results in what
I'm calling a "minor" fault. As a concrete example, when working with
hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing
page.

This commit adds the new registration mode, and sets the relevant flag on
the VMAs being registered. In the hugetlb fault path, if we find that we
have huge_pte_none(), but find_lock_page() does indeed find an existing
page, then we have a "minor" fault, and if the VMA has the userfaultfd
registration flag, we call into userfaultfd to handle it.

This is implemented as a new registration mode, instead of an API feature.
This is because the alternative implementation has significant drawbacks
[1].

However, doing it this was requires we allocate a VM_* flag for the new
registration mode. On 32-bit systems, there are no unused bits, so this
feature is only supported on architectures with
CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in
MINOR mode on 32-bit architectures, we return -EINVAL.

[1] https://lore.kernel.org/patchwork/patch/1380226/

[peterx@redhat.com: fix minor fault page leak]
Link: https://lkml.kernel.org/r/20210322175132.36659-1-peterx@redhat.com

Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dce44566 29-Apr-2021 Anshuman Khandual <anshuman.khandual@arm.com>

mm/memtest: add ARCH_USE_MEMTEST

early_memtest() does not get called from all architectures. Hence
enabling CONFIG_MEMTEST and providing a valid memtest=[1..N] kernel
command line option might not trigger the memory pattern tests as would be
expected in normal circumstances. This situation is misleading.

The change here prevents the above mentioned problem after introducing a
new config option ARCH_USE_MEMTEST that should be subscribed on platforms
that call early_memtest(), in order to enable the config CONFIG_MEMTEST.
Conversely CONFIG_MEMTEST cannot be enabled on platforms where it would
not be tested anyway.

Link: https://lkml.kernel.org/r/1617269193-22294-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> (arm64)
Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3fb0fdb3 13-Feb-2021 Andy Lutomirski <luto@kernel.org>

x86/stackprotector/32: Make the canary into a regular percpu variable

On 32-bit kernels, the stackprotector canary is quite nasty -- it is
stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
percpu storage. It's even nastier because it means that whether %gs
contains userspace state or kernel state while running kernel code
depends on whether stackprotector is enabled (this is
CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
that segment selectors work. Supporting both variants is a
maintenance and testing mess.

Merely rearranging so that percpu and the stack canary
share the same segment would be messy as the 32-bit percpu address
layout isn't currently compatible with putting a variable at a fixed
offset.

Fortunately, GCC 8.1 added options that allow the stack canary to be
accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
percpu variable. This lets us get rid of all of the code to manage the
stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.

(That name is special. We could use any symbol we want for the
%fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
name other than __stack_chk_guard.)

Forcibly disable stackprotector on older compilers that don't support
the new options and turn the stack canary into a percpu variable. The
"lazy GS" approach is now used for all 32-bit configurations.

Also makes load_gs_index() work on 32-bit kernels. On 64-bit kernels,
it loads the GS selector and updates the user GSBASE accordingly. (This
is unchanged.) On 32-bit kernels, it loads the GS selector and updates
GSBASE, which is now always the user base. This means that the overall
effect is the same on 32-bit and 64-bit, which avoids some ifdeffery.

[ bp: Massage commit message. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/c0ff7dba14041c7e5d1cae5d4df052f03759bef3.1613243844.git.luto@kernel.org


# c2209ea5 20-Apr-2021 Ingo Molnar <mingo@kernel.org>

x86/platform/uv: Fix !KEXEC build failure

When KEXEC is disabled, the UV build fails:

arch/x86/platform/uv/uv_nmi.c:875:14: error: ‘uv_nmi_kexec_failed’ undeclared (first use in this function)

Since uv_nmi_kexec_failed is only defined in the KEXEC_CORE #ifdef branch,
this code cannot ever have been build tested:

if (main)
pr_err("UV: NMI kdump: KEXEC not supported in this kernel\n");
atomic_set(&uv_nmi_kexec_failed, 1);

Nor is this use possible in uv_handle_nmi():

atomic_set(&uv_nmi_kexec_failed, 0);

These bugs were introduced in this commit:

d0a9964e9873: ("x86/platform/uv: Implement simple dump failover if kdump fails")

Which added the uv_nmi_kexec_failed assignments to !KEXEC code, while making the
definition KEXEC-only - apparently without testing the !KEXEC case.

Instead of complicating the #ifdef maze, simplify the code by requiring X86_UV
to depend on KEXEC_CORE. This pattern is present in other architectures as well.

( We'll remove the untested, 7 years old !KEXEC complications from the file in a
separate commit. )

Fixes: d0a9964e9873: ("x86/platform/uv: Implement simple dump failover if kdump fails")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Mike Travis <travis@sgi.com>
Cc: linux-kernel@vger.kernel.org


# 0ef3439c 13-Apr-2021 Maciej W. Rozycki <macro@orcam.me.uk>

x86/build: Disable HIGHMEM64G selection for M486SX

Fix a regression caused by making the 486SX separately selectable in
Kconfig, for which the HIGHMEM64G setting has not been updated and
therefore has become exposed as a user-selectable option for the M486SX
configuration setting unlike with original M486 and all the other
settings that choose non-PAE-enabled processors:

High Memory Support
> 1. off (NOHIGHMEM)
2. 4GB (HIGHMEM4G)
3. 64GB (HIGHMEM64G)
choice[1-3?]:

With the fix in place the setting is now correctly removed:

High Memory Support
> 1. off (NOHIGHMEM)
2. 4GB (HIGHMEM4G)
choice[1-2?]:

[ bp: Massage commit message. ]

Fixes: 87d6021b8143 ("x86/math-emu: Limit MATH_EMULATION to 486SX compatibles")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.5+
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.2104141221340.44318@angie.orcam.me.uk


# fe950f60 01-Apr-2021 Kees Cook <keescook@chromium.org>

x86/entry: Enable random_kstack_offset support

Allow for a randomized stack offset on a per-syscall basis, with roughly
5-6 bits of entropy, depending on compiler and word size. Since the
method of offsetting uses macros, this cannot live in the common entry
code (the stack offset needs to be retained for the life of the syscall,
which means it needs to happen at the actual entry point).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210401232347.2791257-5-keescook@chromium.org


# 901ddbb9 17-Mar-2021 Jarkko Sakkinen <jarkko@kernel.org>

x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()

Background
==========

SGX enclave memory is enumerated by the processor in contiguous physical
ranges called Enclave Page Cache (EPC) sections. Currently, there is a
free list per section, but allocations simply target the lowest-numbered
sections. This is functional, but has no NUMA awareness.

Fortunately, EPC sections are covered by entries in the ACPI SRAT table.
These entries allow each EPC section to be associated with a NUMA node,
just like normal RAM.

Solution
========

Implement a NUMA-aware enclave page allocator. Mirror the buddy allocator
and maintain a list of enclave pages for each NUMA node. Attempt to
allocate enclave memory first from local nodes, then fall back to other
nodes.

Note that the fallback is not as sophisticated as the buddy allocator
and is itself not aware of NUMA distances. When a node's free list is
empty, it searches for the next-highest node with enclave pages (and
will wrap if necessary). This could be improved in the future.

Other
=====

NUMA_KEEP_MEMINFO dependency is required for phys_to_target_node().

[ Kai Huang: Do not return NULL from __sgx_alloc_epc_page() because
callers do not expect that and that leads to a NULL ptr deref. ]

[ dhansen: Fix an uninitialized 'nid' variable in
__sgx_alloc_epc_page() as

Reported-by: kernel test robot <lkp@intel.com>

to avoid any potential allocations from the wrong NUMA node or even
premature allocation failures. ]

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com/
Link: https://lkml.kernel.org/r/20210319040602.178558-1-kai.huang@intel.com
Link: https://lkml.kernel.org/r/20210318214933.29341-1-dave.hansen@intel.com
Link: https://lkml.kernel.org/r/20210317235332.362001-2-jarkko.sakkinen@intel.com


# a0e2bf7c 11-Mar-2021 Juergen Gross <jgross@suse.com>

x86/paravirt: Switch time pvops functions to use static_call()

The time pvops functions are the only ones left which might be
used in 32-bit mode and which return a 64-bit value.

Switch them to use the static_call() mechanism instead of pvops, as
this allows quite some simplification of the pvops implementation.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-5-jgross@suse.com


# 22916417 04-Mar-2021 Tom Lendacky <thomas.lendacky@amd.com>

x86/virtio: Have SEV guests enforce restricted virtio memory access

An SEV guest requires that virtio devices use the DMA API to allow the
hypervisor to successfully access guest memory as needed.

The VIRTIO_F_VERSION_1 and VIRTIO_F_ACCESS_PLATFORM features tell virtio
to use the DMA API. Add arch_has_restricted_virtio_memory_access() for
x86, to fail the device probe if these features have not been set for the
device when running as an SEV guest.

[ bp: Fix -Wmissing-prototypes warning
Reported-by: kernel test robot <lkp@intel.com> ]

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/b46e0211f77ca1831f11132f969d470a6ffc9267.1614897610.git.thomas.lendacky@amd.com


# 1dc0da6e 25-Feb-2021 Alexander Potapenko <glider@google.com>

x86, kfence: enable KFENCE for x86

Add architecture specific implementation details for KFENCE and enable
KFENCE for the x86 architecture. In particular, this implements the
required interface in <asm/kfence.h> for setting up the pool and
providing helper functions for protecting and unprotecting pages.

For x86, we need to ensure that the pool uses 4K pages, which is done
using the set_memory_4k() helper function.

[elver@google.com: add missing copyright and description header]
Link: https://lkml.kernel.org/r/20210118092159.145934-2-elver@google.com

Link: https://lkml.kernel.org/r/20201103175841.3495947-3-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Marco Elver <elver@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4590d98f 11-Feb-2021 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

sfi: Remove framework for deprecated firmware

SFI-based platforms are gone. So does this framework.

This removes mention of SFI through the drivers and other code as well.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# cd1a41ce 09-Feb-2021 Thomas Gleixner <tglx@linutronix.de>

softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig

To prepare for inlining do_softirq_own_stack() replace
__ARCH_HAS_DO_SOFTIRQ with a Kconfig switch and select it in the affected
architectures.

This allows in the next step to move the function prototype and the inline
stub into a seperate asm-generic header file which is required to avoid
include recursion.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210210002513.181713427@linutronix.de


# 624db9ea 09-Feb-2021 Thomas Gleixner <tglx@linutronix.de>

x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK

Now that all invocations of irq_exit_rcu() happen on the irq stack, turn on
CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK which causes the core code to invoke
__do_softirq() directly without going through do_softirq_own_stack().

That means do_softirq_own_stack() is only invoked from task context which
means it can't be on the irq stack. Remove the conditional from
run_softirq_on_irqstack_cond() and rename the function accordingly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210210002513.068033456@linutronix.de


# 1b79fc4f 25-Jan-2021 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

x86/apb_timer: Remove driver for deprecated platform

Intel Moorestown and Medfield are quite old Intel Atom based
32-bit platforms, which were in limited use in some Android phones,
tablets and consumer electronics more than eight years ago.

There are no bugs or problems ever reported outside from Intel
for breaking any of that platforms for years. It seems no real
users exists who run more or less fresh kernel on it. Commit
05f4434bc130 ("ASoC: Intel: remove mfld_machine") is also in align
with this theory.

Due to above and to reduce a burden of supporting outdated drivers,
remove the support for outdated platforms completely.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b33fff07 10-Apr-2019 Sami Tolvanen <samitolvanen@google.com>

x86, build: allow LTO to be selected

Pass code model and stack alignment to the linker as these are not
stored in LLVM bitcode, and allow CONFIG_LTO_CLANG* to be enabled.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>


# 6dafca97 06-Aug-2020 Sami Tolvanen <samitolvanen@google.com>

x86, build: use objtool mcount

Select HAVE_OBJTOOL_MCOUNT if STACK_VALIDATION is selected to use
objtool to generate __mcount_loc sections for dynamic ftrace with
Clang and gcc <5 (later versions of gcc use -mrecord-mcount).

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>


# 6ef869e0 18-Jan-2021 Michal Hocko <mhocko@kernel.org>

preempt: Introduce CONFIG_PREEMPT_DYNAMIC

Preemption mode selection is currently hardcoded on Kconfig choices.
Introduce a dedicated option to tune preemption flavour at boot time,

This will be only available on architectures efficiently supporting
static calls in order not to tempt with the feature against additional
overhead that might be prohibitive or undesirable.

CONFIG_PREEMPT_DYNAMIC is automatically selected by CONFIG_PREEMPT if
the architecture provides the necessary support (CONFIG_STATIC_CALL_INLINE,
CONFIG_GENERIC_ENTRY, and provide with __preempt_schedule_function() /
__preempt_schedule_notrace_function()).

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
[peterz: relax requirement to HAVE_STATIC_CALL]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-5-frederic@kernel.org


# a6a0683b 14-Jan-2021 Viresh Kumar <viresh.kumar@linaro.org>

arch: x86: Remove CONFIG_OPROFILE support

The "oprofile" user-space tools don't use the kernel OPROFILE support
any more, and haven't in a long time. User-space has been converted to
the perf interfaces.

Remove the old oprofile's architecture specific support.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Robert Richter <rric@kernel.org>
Acked-by: William Cohen <wcohen@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Thomas Gleixner <tglx@linutronix.de>


# 41026c34 02-Dec-2020 Al Viro <viro@zeniv.linux.org.uk>

Kconfig: regularize selection of CONFIG_BINFMT_ELF

with mips converted to use of fs/config_binfmt_elf.c, there's no
need to keep selects of that thing all over arch/* - we can simply
turn into def_bool y if COMPAT && BINFMT_ELF (in fs/Kconfig.binfmt)
and get rid of all selects.

Several architectures got those selects wrong (e.g. you could
end up with sparc64 sans BINFMT_ELF, with select violating
dependencies, etc.)

Randy Dunlap has spotted some of those; IMO this is simpler than
his fix, but it depends upon the stuff that would need to be
backported, so we might end up using his variant for -stable.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 85f2ada7 04-Jan-2021 Al Viro <viro@zeniv.linux.org.uk>

x32: make X32, !IA32_EMULATION setups able to execute x32 binaries

It's really trivial - the only wrinkle is making sure that
compiler knows that ia32-related side of COMPAT_ARCH_DLINFO
is dead code on such configs (we don't get there without
having passed compat_elf_check_arch(), and on such configs
that'll fail for ia32 binary).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 7facdc42 13-Jun-2020 Al Viro <viro@zeniv.linux.org.uk>

[amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properly

To get rid of hardcoded size/offset in those macros we need to have
a definition of i386 variant of struct elf_prstatus. However, we can't
do that in asm/compat.h - the types needed for that are not there and
adding an include of asm/user32.h into asm/compat.h would cause a lot
of mess.

That could be conveniently done in elfcore-compat.h, but currently there
is nowhere to put arch-dependent parts of it - no asm/elfcore-compat.h.
So we introduce a new file (asm/elfcore-compat.h, present on architectures
that have CONFIG_ARCH_HAS_ELFCORE_COMPAT set, currently only on x86),
have it pulled by linux/elfcore-compat.h and move the definitions there.

As a side benefit, we don't need to worry about accidental inclusion of
that file into binfmt_elf.c itself, so we don't need the dance with
COMPAT_PRSTATUS_SIZE, etc. - only fs/compat_binfmt_elf.c will see
that header.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 9223d0dc 07-Jan-2021 Borislav Petkov <bp@suse.de>

thermal: Move therm_throt there from x86/mce

This functionality has nothing to do with MCE, move it to the thermal
framework and untangle it from MCE.

Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lkml.kernel.org/r/20210202121003.GD18075@zn.tnic


# 3228e1dc 04-Feb-2021 Anand K Mistry <amistry@google.com>

x86/Kconfig: Remove HPET_EMULATE_RTC depends on RTC

The RTC config option was removed in commit f52ef24be21a ("rtc/alpha:
remove legacy rtc driver")

Signed-off-by: Anand K Mistry <amistry@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20210204183205.1.If5c6ded53a00ecad6a02a1e974316291cc0239d1@changeid


# 2ca408d9 30-Nov-2020 Brian Gerst <brgerst@gmail.com>

fanotify: Fix sys_fanotify_mark() on native x86-32

Commit

121b32a58a3a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments")

converted native x86-32 which take 64-bit arguments to use the
compat handlers to allow conversion to passing args via pt_regs.
sys_fanotify_mark() was however missed, as it has a general compat
handler. Add a config option that will use the syscall wrapper that
takes the split args for native 32-bit.

[ bp: Fix typo in Kconfig help text. ]

Fixes: 121b32a58a3a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments")
Reported-by: Paweł Jasiak <pawel@jasiak.xyz>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20201130223059.101286-1-brgerst@gmail.com


# 5d6ad668 14-Dec-2020 Mike Rapoport <rppt@kernel.org>

arch, mm: restore dependency of __kernel_map_pages() on DEBUG_PAGEALLOC

The design of DEBUG_PAGEALLOC presumes that __kernel_map_pages() must
never fail. With this assumption is wouldn't be safe to allow general
usage of this function.

Moreover, some architectures that implement __kernel_map_pages() have this
function guarded by #ifdef DEBUG_PAGEALLOC and some refuse to map/unmap
pages when page allocation debugging is disabled at runtime.

As all the users of __kernel_map_pages() were converted to use
debug_pagealloc_map_pages() it is safe to make it available only when
DEBUG_PAGEALLOC is set.

Link: https://lkml.kernel.org/r/20201109192128.960-4-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# be37c98d 14-Dec-2020 Kalesh Singh <kaleshsingh@google.com>

x86: mremap speedup - Enable HAVE_MOVE_PUD

HAVE_MOVE_PUD enables remapping pages at the PUD level if both the
source and destination addresses are PUD-aligned.

With HAVE_MOVE_PUD enabled it can be inferred that there is
approximately a 13x improvement in performance on x86. (See data
below).

------- Test Results ---------

The following results were obtained using a 5.4 kernel, by remapping
a PUD-aligned, 1GB sized region to a PUD-aligned destination.
The results from 10 iterations of the test are given below:

Total mremap times for 1GB data on x86. All times are in nanoseconds.

Control HAVE_MOVE_PUD

180394 15089
235728 14056
238931 25741
187330 13838
241742 14187
177925 14778
182758 14728
160872 14418
205813 15107
245722 13998

205721.5 15594 <-- Mean time in nanoseconds

A 1GB mremap completion time drops from ~205 microseconds
to ~15 microseconds on x86. (~13x speed up).

Link: https://lkml.kernel.org/r/20201014005320.2233162-6-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Hassan Naveed <hnaveed@wavecomp.com>
Cc: Jia He <justin.he@arm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 59612b24 19-Nov-2020 Nathan Chancellor <nathan@kernel.org>

kbuild: Hoist '--orphan-handling' into Kconfig

Currently, '--orphan-handling=warn' is spread out across four different
architectures in their respective Makefiles, which makes it a little
unruly to deal with in case it needs to be disabled for a specific
linker version (in this case, ld.lld 10.0.1).

To make it easier to control this, hoist this warning into Kconfig and
the main Makefile so that disabling it is simpler, as the warning will
only be enabled in a couple places (main Makefile and a couple of
compressed boot folders that blow away LDFLAGS_vmlinx) and making it
conditional is easier due to Kconfig syntax. One small additional
benefit of this is saving a call to ld-option on incremental builds
because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN.

To keep the list of supported architectures the same, introduce
CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to
gain this automatically after all of the sections are specified and size
asserted. A special thanks to Kees Cook for the help text on this
config.

Link: https://github.com/ClangBuiltLinux/linux/issues/1187
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>


# 14df3267 18-Nov-2020 Thomas Gleixner <tglx@linutronix.de>

x86: Support kmap_local() forced debugging

kmap_local() and related interfaces are NOOPs on 64bit and only create
temporary fixmaps for highmem pages on 32bit. That means the test coverage
for this code is pretty small.

CONFIG_KMAP_LOCAL can be enabled independent from CONFIG_HIGHMEM, which
allows to provide support for enforced kmap_local() debugging even on
64bit.

For 32bit the support is unconditional, for 64bit it's only supported when
CONFIG_NR_CPUS <= 4096 as supporting it for 8192 CPUs would require to set
up yet another fixmap PGT.

If CONFIG_KMAP_LOCAL_FORCE_DEBUG is enabled then kmap_local()/kmap_atomic()
will use the temporary fixmap mapping path.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20201118204007.169209557@linutronix.de


# d1f250e2 17-Nov-2020 Frederic Weisbecker <frederic@kernel.org>

x86: Support HAVE_CONTEXT_TRACKING_OFFSTACK

A lot of ground work has been performed on x86 entry code. Fragile path
between user_enter() and user_exit() have IRQs disabled. Uses of RCU and
intrumentation in these fragile areas have been explicitly annotated
and protected.

This architecture doesn't need exception_enter()/exception_exit()
anymore and has therefore earned CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201117151637.259084-6-frederic@kernel.org


# e7e05452 12-Nov-2020 Sean Christopherson <seanjc@google.com>

x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections

Although carved out of normal DRAM, enclave memory is marked in the
system memory map as reserved and is not managed by the core mm. There
may be several regions spread across the system. Each contiguous region
is called an Enclave Page Cache (EPC) section. EPC sections are
enumerated via CPUID

Enclave pages can only be accessed when they are mapped as part of an
enclave, by a hardware thread running inside the enclave.

Parse CPUID data, create metadata for EPC pages and populate a simple
EPC page allocator. Although much smaller, ‘struct sgx_epc_page’
metadata is the SGX analog of the core mm ‘struct page’.

Similar to how the core mm’s page->flags encode zone and NUMA
information, embed the EPC section index to the first eight bits of
sgx_epc_page->desc. This allows a quick reverse lookup from EPC page to
EPC section. Existing client hardware supports only a single section,
while upcoming server hardware will support at most eight sections.
Thus, eight bits should be enough for long term needs.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Serge Ayoun <serge.ayoun@intel.com>
Signed-off-by: Serge Ayoun <serge.ayoun@intel.com>
Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-6-jarkko@kernel.org


# 02a474ca 27-Oct-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default

Currently, the only way to get access to the registers of a function via a
ftrace callback is to set the "FL_SAVE_REGS" bit in the ftrace_ops. But as this
saves all regs as if a breakpoint were to trigger (for use with kprobes), it
is expensive.

The regs are already saved on the stack for the default ftrace callbacks, as
that is required otherwise a function being traced will get the wrong
arguments and possibly crash. And on x86, the arguments are already stored
where they would be on a pt_regs structure to use that code for both the
regs version of a callback, it makes sense to pass that information always
to all functions.

If an architecture does this (as x86_64 now does), it is to set
HAVE_DYNAMIC_FTRACE_WITH_ARGS, and this will let the generic code that it
could have access to arguments without having to set the flags.

This also includes having the stack pointer being saved, which could be used
for accessing arguments on the stack, as well as having the function graph
tracer not require its own trampoline!

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 157e118b 03-Nov-2020 Thomas Gleixner <tglx@linutronix.de>

x86/mm/highmem: Use generic kmap atomic implementation

Convert X86 to the generic kmap atomic implementation and make the
iomap_atomic() naming convention consistent while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201103095857.375127260@linutronix.de


# 0774a6ed 24-Sep-2020 Arnd Bergmann <arnd@arndb.de>

timekeeping: default GENERIC_CLOCKEVENTS to enabled

Almost all machines use GENERIC_CLOCKEVENTS, so it feels wrong to
require each one to select that symbol manually.

Instead, enable it whenever CONFIG_LEGACY_TIMER_TICK is disabled as
a simplification. It should be possible to select both
GENERIC_CLOCKEVENTS and LEGACY_TIMER_TICK from an architecture now
and decide at runtime between the two.

For the clockevents arch-support.txt file, this means that additional
architectures are marked as TODO when they have at least one machine
that still uses LEGACY_TIMER_TICK, rather than being marked 'ok' when
at least one machine has been converted. This means that both m68k and
arm (for riscpc) revert to TODO.

At this point, we could just always enable CONFIG_GENERIC_CLOCKEVENTS
rather than leaving it off when not needed. I built an m68k
defconfig kernel (using gcc-10.1.0) and found that this would add
around 5.5KB in kernel image size:

text data bss dec hex filename
3861936 1092236 196656 5150828 4e986c obj-m68k/vmlinux-no-clockevent
3866201 1093832 196184 5156217 4ead79 obj-m68k/vmlinux-clockevent

On Arm (MACH_RPC), that difference appears to be twice as large,
around 11KB on top of an 6MB vmlinux.

Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# 282a181b 24-Sep-2020 YiFei Zhu <yifeifz2@illinois.edu>

seccomp: Move config option SECCOMP to arch/Kconfig

In order to make adding configurable features into seccomp easier,
it's better to have the options at one single location, considering
especially that the bulk of seccomp code is arch-independent. An quick
look also show that many SECCOMP descriptions are outdated; they talk
about /proc rather than prctl.

As a result of moving the config option and keeping it default on,
architectures arm, arm64, csky, riscv, sh, and xtensa did not have SECCOMP
on by default prior to this and SECCOMP will be default in this change.

Architectures microblaze, mips, powerpc, s390, sh, and sparc have an
outdated depend on PROC_FS and this dependency is removed in this change.

Suggested-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/lkml/CAG48ez1YWz9cnp08UZgeieYRhHdqh-ch7aNwc4JRBnGyrmgfMg@mail.gmail.com/
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
[kees: added HAVE_ARCH_SECCOMP help text, tweaked wording]
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/9ede6ef35c847e58d61e476c6a39540520066613.1600951211.git.yifeifz2@illinois.edu


# ec6347bb 05-Oct-2020 Dan Williams <dan.j.williams@intel.com>

x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()

In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled.

Of particular concern is that even though x86 might be able to handle
the semantics of copy_mc_to_user() with its common copy_user_generic()
implementation other archs likely need / want an explicit path for this
case:

On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > However now I see that copy_user_generic() works for the wrong reason.
> > It works because the exception on the source address due to poison
> > looks no different than a write fault on the user address to the
> > caller, it's still just a short copy. So it makes copy_to_user() work
> > for the wrong reason relative to the name.
>
> Right.
>
> And it won't work that way on other architectures. On x86, we have a
> generic function that can take faults on either side, and we use it
> for both cases (and for the "in_user" case too), but that's an
> artifact of the architecture oddity.
>
> In fact, it's probably wrong even on x86 - because it can hide bugs -
> but writing those things is painful enough that everybody prefers
> having just one function.

Replace a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().

Introduce an x86 copy_mc_fragile() name as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.

One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.

[ bp: Massage a bit. ]

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com


# 7ca435cf 26-Aug-2020 Thomas Gleixner <tglx@linutronix.de>

x86/irq: Cleanup the arch_*_msi_irqs() leftovers

Get rid of all the gunk and remove the 'select PCI_MSI_ARCH_FALLBACK' from
the x86 Kconfig so the weak functions in the PCI core are replaced by stubs
which emit a warning, which ensures that any fail to set the irq domain
pointer results in a warning when the device is used.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200826112334.086003720@linutronix.de


# 077ee78e 26-Aug-2020 Thomas Gleixner <tglx@linutronix.de>

PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable

The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
requires them or not. Architectures which are fully utilizing hierarchical
irq domains should never call into that code.

It's not only architectures which depend on that by implementing one or
more of the weak functions, there is also a bunch of drivers which relies
on the weak functions which invoke msi_controller::setup_irq[s] and
msi_controller::teardown_irq.

Make the architectures and drivers which rely on them select them in Kconfig
and if not selected replace them by stub functions which emit a warning and
fail the PCI/MSI interrupt allocation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200826112333.992429909@linutronix.de


# 47058bb5 03-Sep-2020 Christoph Hellwig <hch@lst.de>

x86: remove address space overrides using set_fs()

Stop providing the possibility to override the address space using
set_fs() now that there is no need for that any more. To properly
handle the TASK_SIZE_MAX checking for 4 vs 5-level page tables on
x86 a new alternative is introduced, which just like the one in
entry_64.S has to use the hardcoded virtual address bits to escape
the fact that TASK_SIZE_MAX isn't actually a constant when 5-level
page tables are enabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 5e6e9852 03-Sep-2020 Christoph Hellwig <hch@lst.de>

uaccess: add infrastructure for kernel builds with set_fs()

Add a CONFIG_SET_FS option that is selected by architecturess that
implement set_fs, which is all of them initially. If the option is not
set stubs for routines related to overriding the address space are
provided so that architectures can start to opt out of providing set_fs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 597cfe48 07-Sep-2020 Joerg Roedel <jroedel@suse.de>

x86/boot/compressed/64: Setup a GHCB-based VC Exception handler

Install an exception handler for #VC exception that uses a GHCB. Also
add the infrastructure for handling different exit-codes by decoding
the instruction that caused the exception and error handling.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-24-joro@8bytes.org


# 1e7e4788 18-Aug-2020 Josh Poimboeuf <jpoimboe@redhat.com>

x86/static_call: Add inline static call implementation for x86-64

Add the inline static call implementation for x86-64. The generated code
is identical to the out-of-line case, except we move the trampoline into
it's own section.

Objtool uses the trampoline naming convention to detect all the call
sites. It then annotates those call sites in the .static_call_sites
section.

During boot (and module init), the call sites are patched to call
directly into the destination function. The temporary trampoline is
then no longer used.

[peterz: merged trampolines, put trampoline in section]

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20200818135804.864271425@infradead.org


# e6d6c071 18-Aug-2020 Josh Poimboeuf <jpoimboe@redhat.com>

x86/static_call: Add out-of-line static call implementation

Add the x86 out-of-line static call implementation. For each key, a
permanent trampoline is created which is the destination for all static
calls for the given key. The trampoline has a direct jump which gets
patched by static_call_update() when the destination function changes.

[peterz: fixed trampoline, rewrote patching code]

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20200818135804.804315175@infradead.org


# 00998085 29-Jul-2020 Thomas Gleixner <tglx@linutronix.de>

x86: Select POSIX_CPU_TIMERS_TASK_WORK

Move POSIX CPU timer expiry and signal delivery into task context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200730102337.888613724@linutronix.de


# fb46d057 30-Jul-2020 Nick Terrell <terrelln@fb.com>

x86: Add support for ZSTD compressed kernel

- Add support for zstd compressed kernel

- Define __DISABLE_EXPORTS in Makefile

- Remove __DISABLE_EXPORTS definition from kaslr.c

- Bump the heap size for zstd.

- Update the documentation.

Integrates the ZSTD decompression code to the x86 pre-boot code.

Zstandard requires slightly more memory during the kernel decompression
on x86 (192 KB vs 64 KB), and the memory usage is independent of the
window size.

__DISABLE_EXPORTS is now defined in the Makefile, which covers both
the existing use in kaslr.c, and the use needed by the zstd decompressor
in misc.c.

This patch has been boot tested with both a zstd and gzip compressed
kernel on i386 and x86_64 using buildroot and QEMU.

Additionally, this has been tested in production on x86_64 devices.
We saw a 2 second boot time reduction by switching kernel compression
from xz to zstd.

Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200730190841.2071656-7-nickrterrell@gmail.com


# 27d6b4d1 22-Jul-2020 Thomas Gleixner <tglx@linutronix.de>

x86/entry: Use generic syscall entry function

Replace the syscall entry work handling with the generic version. Provide
the necessary helper inlines to handle the real architecture specific
parts, e.g. ptrace.

Use a temporary define for idtentry_enter_user which will be cleaned up
seperately.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200722220520.376213694@linutronix.de


# 2f9237d4 08-Jul-2020 Christoph Hellwig <hch@lst.de>

dma-mapping: make support for dma ops optional

Avoid the overhead of the dma ops support for tiny builds that only
use the direct mapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>


# 140c8180 24-May-2020 Christian Brauner <christian.brauner@ubuntu.com>

arch: remove HAVE_COPY_THREAD_TLS

All architectures support copy_thread_tls() now, so remove the legacy
copy_thread() function and the HAVE_COPY_THREAD_TLS config option. Everyone
uses the same process creation calling convention based on
copy_thread_tls() and struct kernel_clone_args. This will make it easier to
maintain the core process creation code under kernel/, simplifies the
callpaths and makes the identical for all architectures.

Cc: linux-arch@vger.kernel.org
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Greentime Hu <green.hu@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>


# 0f1441b4 12-Jun-2020 Peter Zijlstra <peterz@infradead.org>

objtool: Fix noinstr vs KCOV

Since many compilers cannot disable KCOV with a function attribute,
help it to NOP out any __sanitizer_cov_*() calls injected in noinstr
code.

This turns:

12: e8 00 00 00 00 callq 17 <lockdep_hardirqs_on+0x17>
13: R_X86_64_PLT32 __sanitizer_cov_trace_pc-0x4

into:

12: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
13: R_X86_64_NONE __sanitizer_cov_trace_pc-0x4

Just like recordmcount does.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dmitry Vyukov <dvyukov@google.com>


# b1d40575 25-May-2020 Vitaly Kuznetsov <vkuznets@redhat.com>

KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery

KVM now supports using interrupt for 'page ready' APF event delivery and
legacy mechanism was deprecated. Switch KVM guests to the new one.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-9-vkuznets@redhat.com>
[Use HYPERVISOR_CALLBACK_VECTOR instead of a separate vector. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# c8a59a4d 10-Jun-2020 Herbert Xu <herbert@gondor.apana.org.au>

x86/microcode: Do not select FW_LOADER

The x86 microcode support works just fine without FW_LOADER. In fact,
these days most people load microcode early during boot so FW_LOADER
never gets into the picture anyway.

As almost everyone on x86 needs to enable MICROCODE, this by extension
means that FW_LOADER is always built into the kernel even if nothing
uses it. The FW_LOADER system is about two thousand lines long and
contains user-space facing interfaces that could potentially provide an
entry point into the kernel (or beyond).

Remove the unnecessary select of FW_LOADER by MICROCODE. People who need
the FW_LOADER capability can still enable it.

[ bp: Massage a bit. ]

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200610042911.GA20058@gondor.apana.org.au


# a7f7f624 13-Jun-2020 Masahiro Yamada <masahiroy@kernel.org>

treewide: replace '---help---' in Kconfig files with 'help'

Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

a) 4 spaces + '---help---'
b) 7 spaces + '---help---'
c) 8 spaces + '---help---'
d) 1 space + 1 tab + '---help---'
e) 1 tab + '---help---' (correct indentation)
f) 1 tab + 1 space + '---help---'
g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

$ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>


# fa5e5c40 21-May-2020 Thomas Gleixner <tglx@linutronix.de>

x86/entry: Use idtentry for interrupts

Replace the extra interrupt handling code and reuse the existing idtentry
machinery. This moves the irq stack switching on 64-bit from ASM to C code;
32-bit already does the stack switching in C.

This requires to remove HAVE_IRQ_EXIT_ON_IRQ_STACK as the stack switch is
not longer in the low level entry code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.078690991@linutronix.de


# 399145f9 04-Jun-2020 Anshuman Khandual <anshuman.khandual@arm.com>

mm/debug: add tests validating architecture page table helpers

This adds tests which will validate architecture page table helpers and
other accessors in their compliance with expected generic MM semantics.
This will help various architectures in validating changes to existing
page table helpers or addition of new ones.

This test covers basic page table entry transformations including but not
limited to old, young, dirty, clean, write, write protect etc at various
level along with populating intermediate entries with next page table page
and validating them.

Test page table pages are allocated from system memory with required size
and alignments. The mapped pfns at page table levels are derived from a
real pfn representing a valid kernel text symbol. This test gets called
via late_initcall().

This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected.
Any architecture, which is willing to subscribe this test will need to
select ARCH_HAS_DEBUG_VM_PGTABLE. For now this is limited to arc, arm64,
x86, s390 and powerpc platforms where the test is known to build and run
successfully Going forward, other architectures too can subscribe the test
after fixing any build or runtime problems with their page table helpers.

Folks interested in making sure that a given platform's page table helpers
conform to expected generic MM semantics should enable the above config
which will just trigger this test during boot. Any non conformity here
will be reported as an warning which would need to be fixed. This test
will help catch any changes to the agreed upon semantics expected from
generic MM and enable platforms to accommodate it thereafter.

[anshuman.khandual@arm.com: v17]
Link: http://lkml.kernel.org/r/1587436495-22033-3-git-send-email-anshuman.khandual@arm.com
[anshuman.khandual@arm.com: v18]
Link: http://lkml.kernel.org/r/1588564865-31160-3-git-send-email-anshuman.khandual@arm.com
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> [ppc32]
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Link: http://lkml.kernel.org/r/1583919272-24178-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0e96edd9 27-May-2020 Sean Christopherson <seanjc@google.com>

x86/kvm: Remove defunct KVM_DEBUG_FS Kconfig

Remove KVM_DEBUG_FS, which can easily be misconstrued as controlling
KVM-as-a-host. The sole user of CONFIG_KVM_DEBUG_FS was removed by
commit cfd8983f03c7b ("x86, locking/spinlocks: Remove ticket (spin)lock
implementation").

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200528031121.28904-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 7e01ccb4 03-Jun-2020 Zong Li <zong.li@sifive.com>

x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined

Extract DEBUG_WX to mm/Kconfig.debug for shared use. Change to use
ARCH_HAS_DEBUG_WX instead of DEBUG_WX defined by arch port.

Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/430736828d149df3f5b462d291e845ec690e0141.1587455584.git.zong.li@sifive.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# acd3f5c4 03-Jun-2020 Mike Rapoport <rppt@kernel.org>

mm: remove early_pfn_in_nid() and CONFIG_NODES_SPAN_OTHER_NODES

The memmap_init() function was made to iterate over memblock regions and
as the result the early_pfn_in_nid() function became obsolete. Since
CONFIG_NODES_SPAN_OTHER_NODES is only used to pick a stub or a real
implementation of early_pfn_in_nid(), it is also not needed anymore.

Remove both early_pfn_in_nid() and the CONFIG_NODES_SPAN_OTHER_NODES.

Co-developed-by: Hoan Tran <Hoan@os.amperecomputing.com>
Signed-off-by: Hoan Tran <Hoan@os.amperecomputing.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Hoan Tran <hoan@os.amperecomputing.com> [arm64]
Cc: Baoquan He <bhe@redhat.com>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200412194859.12663-17-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3f08a302 03-Jun-2020 Mike Rapoport <rppt@kernel.org>

mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP option

CONFIG_HAVE_MEMBLOCK_NODE_MAP is used to differentiate initialization of
nodes and zones structures between the systems that have region to node
mapping in memblock and those that don't.

Currently all the NUMA architectures enable this option and for the
non-NUMA systems we can presume that all the memory belongs to node 0 and
therefore the compile time configuration option is not required.

The remaining few architectures that use DISCONTIGMEM without NUMA are
easily updated to use memblock_add_node() instead of memblock_add() and
thus have proper correspondence of memblock regions to NUMA nodes.

Still, free_area_init_node() must have a backward compatible version
because its semantics with and without CONFIG_HAVE_MEMBLOCK_NODE_MAP is
different. Once all the architectures will use the new semantics, the
entire compatibility layer can be dropped.

To avoid addition of extra run time memory to store node id for
architectures that keep memblock but have only a single node, the node id
field of the memblock_region is guarded by CONFIG_NEED_MULTIPLE_NODES and
the corresponding accessors presume that in those cases it is always 0.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Hoan Tran <hoan@os.amperecomputing.com> [arm64]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Baoquan He <bhe@redhat.com>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200412194859.12663-4-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 43173265 23-Feb-2020 Mike Rapoport <rppt@kernel.org>

x86/mm: Drop deprecated DISCONTIGMEM support for 32-bit

The DISCONTIGMEM support was marked as deprecated in v5.2 and since there
were no complaints about it for almost 5 releases it can be completely
removed.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20200223094322.15206-1-rppt@kernel.org


# 38f3e775 28-May-2020 Babu Moger <babu.moger@amd.com>

x86/Kconfig: Update config and kernel doc for MPK feature on AMD

AMD's next generation of EPYC processors support the MPK (Memory
Protection Keys) feature. Update the dependency and documentation.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/159068199556.26992.17733929401377275140.stgit@naples-babu.amd.com


# b1a57bbf 07-May-2020 Douglas Anderson <dianders@chromium.org>

kgdb: Delay "kgdbwait" to dbg_late_init() by default

Using kgdb requires at least some level of architecture-level
initialization. If nothing else, it relies on the architecture to
pass breakpoints / crashes onto kgdb.

On some architectures this all works super early, specifically it
starts working at some point in time before Linux parses
early_params's. On other architectures it doesn't. A survey of a few
platforms:

a) x86: Presumably it all works early since "ekgdboc" is documented to
work here.
b) arm64: Catching crashes works; with a simple patch breakpoints can
also be made to work.
c) arm: Nothing in kgdb works until
paging_init() -> devicemaps_init() -> early_trap_init()

Let's be conservative and, by default, process "kgdbwait" (which tells
the kernel to drop into the debugger ASAP at boot) a bit later at
dbg_late_init() time. If an architecture has tested it and wants to
re-enable super early debugging, they can select the
ARCH_HAS_EARLY_DEBUG KConfig option. We'll do this for x86 to start.
It should be noted that dbg_late_init() is still called quite early in
the system.

Note that this patch doesn't affect when kgdb runs its init. If kgdb
is set to initialize early it will still initialize when parsing
early_param's. This patch _only_ inhibits the initial breakpoint from
"kgdbwait". This means:

* Without any extra patches arm64 platforms will at least catch
crashes after kgdb inits.
* arm platforms will catch crashes (and could handle a hardcoded
kgdb_breakpoint()) any time after early_trap_init() runs, even
before dbg_late_init().

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20200507130644.v4.4.I3113aea1b08d8ce36dc3720209392ae8b815201b@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>


# 0ebeea8c 14-May-2020 Daniel Borkmann <daniel@iogearbox.net>

bpf: Restrict bpf_probe_read{, str}() only to archs where they work

Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs
with overlapping address ranges, we should really take the next step to
disable them from BPF use there.

To generally fix the situation, we've recently added new helper variants
bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str().
For details on them, see 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel}
and probe_read_{user,kernel}_str helpers").

Given bpf_probe_read{,str}() have been around for ~5 years by now, there
are plenty of users at least on x86 still relying on them today, so we
cannot remove them entirely w/o breaking the BPF tracing ecosystem.

However, their use should be restricted to archs with non-overlapping
address ranges where they are working in their current form. Therefore,
move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and
have x86, arm64, arm select it (other archs supporting it can follow-up
on it as well).

For the remaining archs, they can workaround easily by relying on the
feature probe from bpftool which spills out defines that can be used out
of BPF C code to implement the drop-in replacement for old/new kernels
via: bpftool feature probe macro

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net


# 82fef0ad 14-Apr-2020 David Rientjes <rientjes@google.com>

x86/mm: unencrypted non-blocking DMA allocations use coherent pools

When CONFIG_AMD_MEM_ENCRYPT is enabled and a device requires unencrypted
DMA, all non-blocking allocations must originate from the atomic DMA
coherent pools.

Select CONFIG_DMA_COHERENT_POOL for CONFIG_AMD_MEM_ENCRYPT.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 54b34aa0 16-Apr-2020 Mika Westerberg <mika.westerberg@linux.intel.com>

platform/x86: intel_scu_ipc: Split out SCU IPC functionality from the SCU driver

The SCU IPC functionality is usable outside of Intel MID devices. For
example modern Intel CPUs include the same thing but now it is called
PMC (Power Management Controller) instead of SCU. To make the IPC
available for those split the driver into core part (intel_scu_ipc.c)
and the SCU PCI driver part (intel_scu_pcidrv.c) which then calls the
former before it goes and creates rest of the SCU devices. The SCU IPC
will also register a new class that gets assigned to the device that is
created under the parent PCI device.

We also split the Kconfig symbols so that INTEL_SCU_IPC enables the SCU
IPC library and INTEL_SCU_PCI the SCU driver and convert the users
accordingly. While there remove default y from the INTEL_SCU_PCI symbol
as it is already selected by X86_INTEL_MID.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>


# b64d8d1e 20-Apr-2020 Peter Xu <peterx@redhat.com>

mm/userfaultfd: disable userfaultfd-wp on x86_32

Userfaultfd-wp is not yet working on 32bit hosts, but it's accidentally
enabled previously. Disable it.

Fixes: 5a281062af1d ("userfaultfd: wp: add WP pagetable tracking to x86")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Link: http://lkml.kernel.org/r/20200413141608.109211-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2ce0d7f9 16-Apr-2020 Mark Brown <broonie@kernel.org>

x86/asm: Provide a Kconfig symbol for disabling old assembly annotations

As x86 was converted to use the modern SYM_ annotations for assembly,
ifdefs were added to remove the generic definitions of the old style
annotations on x86. Rather than collect a list of architectures in the
ifdefs as more architectures are converted over, provide a Kconfig
symbol for this and update x86 to use it.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lkml.kernel.org/r/20200416182402.6206-1-broonie@kernel.org


# 5e8ebd84 26-Mar-2020 Jason A. Donenfeld <Jason@zx2c4.com>

x86: probe assembler capabilities via kconfig instead of makefile

Doing this probing inside of the Makefiles means we have a maze of
ifdefs inside the source code and child Makefiles that need to make
proper decisions on this too. Instead, we do it at Kconfig time, like
many other compiler and assembler options, which allows us to set up the
dependencies normally for full compilation units. In the process, the
ADX test changes to use %eax instead of %r10 so that it's valid in both
32-bit and 64-bit mode.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>


# 5a281062 06-Apr-2020 Andrea Arcangeli <aarcange@redhat.com>

userfaultfd: wp: add WP pagetable tracking to x86

Accurate userfaultfd WP tracking is possible by tracking exactly which
virtual memory ranges were writeprotected by userland. We can't relay
only on the RW bit of the mapped pagetable because that information is
destroyed by fork() or KSM or swap. If we were to relay on that, we'd
need to stay on the safe side and generate false positive wp faults for
every swapped out page.

[peterx@redhat.com: append _PAGE_UFD_WP to _PAGE_CHG_MASK]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-4-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 25c619e5 13-Mar-2020 Brian Gerst <brgerst@gmail.com>

x86/entry/32: Enable pt_regs based syscalls

Enable pt_regs based syscalls for 32-bit. This makes the 32-bit native
kernel consistent with the 64-bit kernel, and improves the syscall
interface by not needing to push all 6 potential arguments onto the stack.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/20200313195144.164260-17-brgerst@gmail.com


# 9e2b4be3 08-Mar-2020 Nayna Jain <nayna@linux.ibm.com>

ima: add a new CONFIG for loading arch-specific policies

Every time a new architecture defines the IMA architecture specific
functions - arch_ima_get_secureboot() and arch_ima_get_policy(), the IMA
include file needs to be updated. To avoid this "noise", this patch
defines a new IMA Kconfig IMA_SECURE_AND_OR_TRUSTED_BOOT option, allowing
the different architectures to select it.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Philipp Rudo <prudo@linux.ibm.com> (s390)
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>


# 17e5888e 23-Jan-2020 Hans de Goede <hdegoede@redhat.com>

x86: Select HARDIRQS_SW_RESEND on x86

Modern x86 laptops are starting to use GPIO pins as interrupts more
and more, e.g. touchpads and touchscreens have almost all moved away
from PS/2 and USB to using I2C with a GPIO pin as interrupt.
Modern x86 laptops also have almost all moved to using s2idle instead
of using the system S3 ACPI power state to suspend.

The Intel and AMD pinctrl drivers do not define irq_retrigger handlers
for the irqchips they register, this is causing edge triggered interrupts
which happen while suspended using s2idle to get lost.

One specific example of this is the lid switch on some devices, lid
switches used to be handled by the embedded-controller, but now the
lid open/closed sensor is sometimes directly connected to a GPIO pin.
On most devices the ACPI code for this looks like this:

Method (_E00, ...) {
Notify (LID0, 0x80) // Status Change
}

Where _E00 is an ACPI event handler for changes on both edges of the GPIO
connected to the lid sensor, this event handler is then combined with an
_LID method which directly reads the pin. When the device is resumed by
opening the lid, the GPIO interrupt will wake the system, but because the
pinctrl irqchip doesn't have an irq_retrigger handler, the Notify will not
happen. This is not a problem in the case the _LID method directly reads
the GPIO, because the drivers/acpi/button.c code will call _LID on resume
anyways.

But some devices have an event handler for the GPIO connected to the
lid sensor which looks like this:

Method (_E00, ...) {
if (LID_GPIO == One)
LIDS = One
else
LIDS = Zero
Notify (LID0, 0x80) // Status Change
}

And the _LID method returns the cached LIDS value, since on open we
do not re-run the edge-interrupt handler when we re-enable IRQS on resume
(because of the missing irq_retrigger handler), _LID now will keep
reporting closed, as LIDS was never changed to reflect the open status,
this causes userspace to re-resume the laptop again shortly after opening
the lid.

The Intel GPIO controllers do not allow implementing irq_retrigger without
emulating it in software, at which point we are better of just using the
generic HARDIRQS_SW_RESEND mechanism rather then re-implementing software
emulation for this separately in aprox. 14 different pinctrl drivers.

Select HARDIRQS_SW_RESEND to solve the problem of edge-triggered GPIO
interrupts not being re-triggered on resume when they were triggered during
suspend (s2idle) and/or when they were the cause of the wakeup.

This requires

008f1d60fe25 ("x86/apic/vector: Force interupt handler invocation to irq context")
c16816acd086 ("genirq: Add protection against unsafe usage of generic_handle_irq()")

to protect the APIC based interrupts from being wreckaged by a software
resend.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200123210242.53367-1-hdegoede@redhat.com


# bdb04a1a 09-Mar-2020 Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>

x86/Kconfig: Drop vendor dependency for X86_UMIP

Some Centaur family 7 CPUs and Zhaoxin family 7 CPUs support the UMIP
feature too. The text size growth which UMIP adds is ~1K and distro
kernels enable it anyway so remove the vendor dependency.

[ bp: Rewrite commit message. ]

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1583733990-2587-1-git-send-email-TonyWWang-oc@zhaoxin.com


# 2a86f661 27-Feb-2020 Masahiro Yamada <masahiroy@kernel.org>

kbuild: use KBUILD_DEFCONFIG as the fallback for DEFCONFIG_LIST

Most of the Kconfig commands (except defconfig and all*config) read
the .config file as a base set of CONFIG options.

When it does not exist, the files in DEFCONFIG_LIST are searched in
this order and loaded if found.

I do not see much sense in the last two lines in DEFCONFIG_LIST.

[1] ARCH_DEFCONFIG

The entry for DEFCONFIG_LIST is guarded by 'depends on !UML'. So, the
ARCH_DEFCONFIG definition in arch/x86/um/Kconfig is meaningless.

arch/{sh,sparc,x86}/Kconfig define ARCH_DEFCONFIG depending on 32 or
64 bit variant symbols. This is a little bit strange; ARCH_DEFCONFIG
should be a fixed string because the base config file is loaded before
the symbol evaluation stage.

Using KBUILD_DEFCONFIG makes more sense because it is fixed before
Kconfig is invoked. Fortunately, arch/{sh,sparc,x86}/Makefile define it
in the same way, and it works as expected. Hence, replace ARCH_DEFCONFIG
with "arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)".

[2] arch/$(ARCH)/defconfig

This file path is no longer valid. The defconfig files are always located
in the arch configs/ directories.

$ find arch -name defconfig | sort
arch/alpha/configs/defconfig
arch/arm64/configs/defconfig
arch/csky/configs/defconfig
arch/nds32/configs/defconfig
arch/riscv/configs/defconfig
arch/s390/configs/defconfig
arch/unicore32/configs/defconfig

The path arch/*/configs/defconfig is already covered by
"arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)". So, this file path is
not necessary.

I moved the default KBUILD_DEFCONFIG to the top Makefile. Otherwise,
the 7 architectures listed above would end up with endless loop of
syncconfig.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>


# 645e6466 23-Jan-2020 Anders Roxell <anders.roxell@linaro.org>

x86/Kconfig: Make CMDLINE_OVERRIDE depend on non-empty CMDLINE

When trying to boot an allmodconfig kernel that is built with
KCONFIG_ALLCONFIG=$(pwd)/arch/x86/configs/x86_64_defconfig, it doesn't
boot since CONFIG_CMDLINE_OVERRIDE gets enabled and that requires the
user to pass the full cmdline to CONFIG_CMDLINE.

Change so that CONFIG_CMDLINE_OVERRIDE gets set only if CONFIG_CMDLINE
is set to something except an empty string.

[ bp: touchup. ]

Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200124114615.11577-1-anders.roxell@linaro.org


# 7b27a862 16-Feb-2020 Dan Williams <dan.j.williams@intel.com>

libnvdimm/e820: Retrieve and populate correct 'target_node' info

Use the new phys_to_target_node() and numa_map_to_online_node() helpers
to retrieve the correct id for the 'numa_node' ("local" / online
initiator node) and 'target_node' (offline target memory node) sysfs
attributes.

Below is an example from a 4 NUMA node system where all the memory on
node2 is pmem / reserved. It should be noted that with the arrival of
the ACPI HMAT table and EFI Specific Purpose Memory the kernel will
start to see more platforms with reserved / performance differentiated
memory in its own NUMA node. Hence all the stakeholders on the Cc for
what is ostensibly a libnvdimm local patch.

=== Before ===

/* Notice no online memory on node2 at start */

# numactl --hardware
available: 3 nodes (0-1,3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
node 0 size: 3958 MB
node 0 free: 3708 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 1 size: 4027 MB
node 1 free: 3871 MB
node 3 cpus:
node 3 size: 3994 MB
node 3 free: 3971 MB
node distances:
node 0 1 3
0: 10 21 21
1: 21 10 21
3: 21 21 10

/*
* Put the pmem namespace into devdax mode so it can be assigned to the
* kmem driver
*/

# ndctl create-namespace -e namespace0.0 -m devdax -f
{
"dev":"namespace0.0",
"mode":"devdax",
"map":"dev",
"size":"3.94 GiB (4.23 GB)",
"uuid":"1650af9b-9ba3-4704-acd6-10178399d9a3",
[..]
}

/* Online Persistent Memory as System RAM */

# daxctl reconfigure-device --mode=system-ram dax0.0
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
[
{
"chardev":"dax0.0",
"size":4225761280,
"target_node":0,
"mode":"system-ram"
}
]
reconfigured 1 device

/* Note that the memory is onlined by default to the wrong node, node0 */

# numactl --hardware
available: 3 nodes (0-1,3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
node 0 size: 7926 MB
node 0 free: 7655 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 1 size: 4027 MB
node 1 free: 3871 MB
node 3 cpus:
node 3 size: 3994 MB
node 3 free: 3971 MB
node distances:
node 0 1 3
0: 10 21 21
1: 21 10 21
3: 21 21 10

=== After ===

/* Notice that the "phys_index" error messages are gone */

# daxctl reconfigure-device --mode=system-ram dax0.0
[
{
"chardev":"dax0.0",
"size":4225761280,
"target_node":2,
"mode":"system-ram"
}
]
reconfigured 1 device

/* Notice that node2 is now correctly populated */

# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
node 0 size: 3958 MB
node 0 free: 3793 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 1 size: 4027 MB
node 1 free: 3851 MB
node 2 cpus:
node 2 size: 3968 MB
node 2 free: 3968 MB
node 3 cpus:
node 3 size: 3994 MB
node 3 free: 3908 MB
node distances:
node 0 1 2 3
0: 10 21 21 21
1: 21 10 21 21
2: 21 21 10 21
3: 21 21 21 10

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/158188327614.894464.13122730362187722603.stgit@dwillia2-desk3.amr.corp.intel.com


# f86fd32d 07-Feb-2020 Thomas Gleixner <tglx@linutronix.de>

lib/vdso: Cleanup clock mode storage leftovers

Now that all architectures are converted to use the generic storage the
helpers and conditionals can be removed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/20200207124403.470699892@linutronix.de


# b95a8a27 07-Feb-2020 Thomas Gleixner <tglx@linutronix.de>

x86/vdso: Use generic VDSO clock mode storage

Switch to the generic VDSO clock mode storage.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> (VDSO parts)
Acked-by: Juergen Gross <jgross@suse.com> (Xen parts)
Acked-by: Paolo Bonzini <pbonzini@redhat.com> (KVM parts)
Link: https://lkml.kernel.org/r/20200207124403.152039903@linutronix.de


# 68d87513 28-Jan-2020 Frederic Weisbecker <frederic@kernel.org>

x86: Remove TIF_NOHZ

Static keys have replaced TIF_NOHZ to optimize the calls to context
tracking. We can now safely remove that thread flag.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>


# 490f561b 27-Jan-2020 Frederic Weisbecker <frederic@kernel.org>

context-tracking: Introduce CONFIG_HAVE_TIF_NOHZ

A few archs (x86, arm, arm64) don't rely anymore on TIF_NOHZ to call
into context tracking on user entry/exit but instead use static keys
(or not) to optimize those calls. Ideally every arch should migrate to
that behaviour in the long run.

Settle a config option to let those archs remove their TIF_NOHZ
definitions.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>


# ff2e6d72 03-Feb-2020 Peter Zijlstra <peterz@infradead.org>

asm-generic/tlb: rename HAVE_RCU_TABLE_FREE

Towards a more consistent naming scheme.

[akpm@linux-foundation.org: fix sparc64 Kconfig]
Link: http://lkml.kernel.org/r/20200116064531.483522-7-aneesh.kumar@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2ae27137 03-Feb-2020 Steven Price <steven.price@arm.com>

x86: mm: convert dump_pagetables to use walk_page_range

Make use of the new functionality in walk_page_range to remove the arch
page walking code and use the generic code to walk the page tables.

The effective permissions are passed down the chain using new fields in
struct pg_state.

The KASAN optimisation is implemented by setting action=CONTINUE in the
callbacks to skip an entire tree of entries.

Link: http://lkml.kernel.org/r/20191218162402.45610-21-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dab01984 21-Jan-2020 Christoph Hellwig <hch@lst.de>

x86/PCI: Remove X86_DEV_DMA_OPS

There are no users of X86_DEV_DMA_OPS left, so remove the code.

Link: https://lore.kernel.org/r/1579613871-301529-8-git-send-email-jonathan.derrick@intel.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>


# 4ba68d00 23-Jan-2020 Dave Hansen <dave.hansen@linux.intel.com>

x86/mpx: remove build infrastructure

From: Dave Hansen <dave.hansen@linux.intel.com>

MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).

Remove the Kconfig option and the Makefile line. This makes
arch/x86/mm/mpx.c and anything under an #ifdef for
X86_INTEL_MPX dead code.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>


# e79f15a4 15-Jan-2020 Chen Yu <yu.c.chen@intel.com>

x86/resctrl: Add task resctrl information display

Monitoring tools that want to find out which resctrl control and monitor
groups a task belongs to must currently read the "tasks" file in every
group until they locate the process ID.

Add an additional file /proc/{pid}/cpu_resctrl_groups to provide this
information:

1) res:
mon:

resctrl is not available.

2) res:/
mon:

Task is part of the root resctrl control group, and it is not associated
to any monitor group.

3) res:/
mon:mon0

Task is part of the root resctrl control group and monitor group mon0.

4) res:group0
mon:

Task is part of resctrl control group group0, and it is not associated
to any monitor group.

5) res:group0
mon:mon1

Task is part of resctrl control group group0 and monitor group mon1.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Jinshi Chen <jinshi.chen@intel.com>
Link: https://lkml.kernel.org/r/20200115092851.14761-1-yu.c.chen@intel.com


# 550a77a7 11-Nov-2019 Dmitry Safonov <0x7f454c46@gmail.com>

x86/vdso: Add time napespace page

To support time namespaces in the VDSO with a minimal impact on regular non
time namespace affected tasks, the namespace handling needs to be hidden in
a slow path.

The most obvious place is vdso_seq_begin(). If a task belongs to a time
namespace then the VVAR page which contains the system wide VDSO data is
replaced with a namespace specific page which has the same layout as the
VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the VDSO data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

Allocate the time namespace page among VVAR pages and place vdso_data on
it. Provide __arch_get_timens_vdso_data() helper for VDSO code to get the
code-relative position of VVARs on that special page.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-23-dima@arista.com


# 8f24f8c2 24-Dec-2019 Ard Biesheuvel <ardb@kernel.org>

efi/libstub: Annotate firmware routines as __efiapi

Annotate all the firmware routines (boot services, runtime services and
protocol methods) called in the boot context as __efiapi, and make
it expand to __attribute__((ms_abi)) on 64-bit x86. This allows us
to use the compiler to generate the calls into firmware that use the
MS calling convention instead of the SysV one.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-13-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# e133f6ea 03-Dec-2019 Randy Dunlap <rdunlap@infradead.org>

x86/Kconfig: Correct spelling and punctuation

End a sentence with a period (aka full stop) in Kconfig help text. Fix
minor NUMA-related Kconfig text:

- Use capital letters for NUMA acronym.
- Hyphenate Non-Uniform.

[ bp: Merge into a single patch. ]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: X86 ML <x86@kernel.org>
Link: https://lkml.kernel.org/r/443ed0a8-783d-6c7c-3258-e1c44df03fd7@infradead.org


# 10916706 03-Dec-2019 Shile Zhang <shile.zhang@linux.alibaba.com>

scripts/sorttable: Rename 'sortextable' to 'sorttable'

Use a more generic name for additional table sorting usecases,
such as the upcoming ORC table sorting feature. This tool is
not tied to exception table sorting anymore.

No functional changes intended.

[ mingo: Rewrote the changelog. ]

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: linux-kbuild@vger.kernel.org
Link: https://lkml.kernel.org/r/20191204004633.88660-6-shile.zhang@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 81c22041 09-Dec-2019 Daniel Borkmann <daniel@iogearbox.net>

bpf, x86, arm64: Enable jit by default when not built as always-on

After Spectre 2 fix via 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON
config") most major distros use BPF_JIT_ALWAYS_ON configuration these days
which compiles out the BPF interpreter entirely and always enables the
JIT. Also given recent fix in e1608f3fa857 ("bpf: Avoid setting bpf insns
pages read-only when prog is jited"), we additionally avoid fragmenting
the direct map for the BPF insns pages sitting in the general data heap
since they are not used during execution. Latter is only needed when run
through the interpreter.

Since both x86 and arm64 JITs have seen a lot of exposure over the years,
are generally most up to date and maintained, there is more downside in
!BPF_JIT_ALWAYS_ON configurations to have the interpreter enabled by default
rather than the JIT. Add a ARCH_WANT_DEFAULT_BPF_JIT config which archs can
use to set the bpf_jit_{enable,kallsyms} to 1. Back in the days the
bpf_jit_kallsyms knob was set to 0 by default since major distros still
had /proc/kallsyms addresses exposed to unprivileged user space which is
not the case anymore. Hence both knobs are set via BPF_JIT_DEFAULT_ON which
is set to 'y' in case of BPF_JIT_ALWAYS_ON or ARCH_WANT_DEFAULT_BPF_JIT.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/f78ad24795c2966efcc2ee19025fa3459f622185.1575903816.git.daniel@iogearbox.net


# b03b016f 20-Nov-2019 Krzysztof Kozlowski <krzk@kernel.org>

x86/Kconfig: Fix Kconfig indentation

Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:

$ sed -e 's/^ /\t/' -i */Kconfig

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/1574306470-10305-1-git-send-email-krzk@kernel.org


# b75baaf3 20-Nov-2019 Ingo Molnar <mingo@kernel.org>

x86/mm/pat: Fix typo in the Kconfig help text

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0609ae01 30-Nov-2019 Daniel Axtens <dja@axtens.net>

x86/kasan: support KASAN_VMALLOC

In the case where KASAN directly allocates memory to back vmalloc space,
don't map the early shadow page over it.

We prepopulate pgds/p4ds for the range that would otherwise be empty.
This is required to get it synced to hardware on boot, allowing the
lower levels of the page tables to be filled dynamically.

Link: http://lkml.kernel.org/r/20191031093909.9228-5-dja@axtens.net
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# fb041bb7 21-Nov-2019 Will Deacon <will@kernel.org>

locking/refcount: Consolidate implementations of refcount_t

The generic implementation of refcount_t should be good enough for
everybody, so remove ARCH_HAS_REFCOUNT and REFCOUNT_FULL entirely,
leaving the generic implementation enabled unconditionally.

Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191121115902.2551-9-will@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 5cbaefe9 20-Nov-2019 Ingo Molnar <mingo@kernel.org>

kcsan: Improve various small stylistic details

Tidy up a few bits:

- Fix typos and grammar, improve wording.

- Remove spurious newlines that are col80 warning artifacts where the
resulting line-break is worse than the disease it's curing.

- Use core kernel coding style to improve readability and reduce
spurious code pattern variations.

- Use better vertical alignment for structure definitions and initialization
sequences.

- Misc other small details.

No change in functionality intended.

Cc: linux-kernel@vger.kernel.org
Cc: Marco Elver <elver@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# c12d3362 08-Nov-2019 Ard Biesheuvel <ardb@kernel.org>

int128: move __uint128_t compiler test to Kconfig

In order to use 128-bit integer arithmetic in C code, the architecture
needs to have declared support for it by setting ARCH_SUPPORTS_INT128,
and it requires a version of the toolchain that supports this at build
time. This is why all existing tests for ARCH_SUPPORTS_INT128 also test
whether __SIZEOF_INT128__ is defined, since this is only the case for
compilers that can support 128-bit integers.

Let's fold this additional test into the Kconfig declaration of
ARCH_SUPPORTS_INT128 so that we can also use the symbol in Makefiles,
e.g., to decide whether a certain object needs to be included in the
first place.

Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 40d04110 14-Nov-2019 Marco Elver <elver@google.com>

x86, kcsan: Enable KCSAN for x86

This patch enables KCSAN for x86, with updates to build rules to not use
KCSAN for several incompatible compilation units.

Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>


# 111e7b15 12-Nov-2019 Thomas Gleixner <tglx@linutronix.de>

x86/ioperm: Extend IOPL config to control ioperm() as well

If iopl() is disabled, then providing ioperm() does not make much sense.

Rename the config option and disable/enable both syscalls with it. Guard
the code with #ifdefs where appropriate.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# a24ca997 11-Nov-2019 Thomas Gleixner <tglx@linutronix.de>

x86/iopl: Remove legacy IOPL option

The IOPL emulation via the I/O bitmap is sufficient. Remove the legacy
cruft dealing with the (e)flags based IOPL mechanism.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com> (Paravirt and Xen parts)
Acked-by: Andy Lutomirski <luto@kernel.org>


# c8137ace 11-Nov-2019 Thomas Gleixner <tglx@linutronix.de>

x86/iopl: Restrict iopl() permission scope

The access to the full I/O port range can be also provided by the TSS I/O
bitmap, but that would require to copy 8k of data on scheduling in the
task. As shown with the sched out optimization TSS.io_bitmap_base can be
used to switch the incoming task to a preallocated I/O bitmap which has all
bits zero, i.e. allows access to all I/O ports.

Implementing this allows to provide an iopl() emulation mode which restricts
the IOPL level 3 permissions to I/O port access but removes the STI/CLI
permission which is coming with the hardware IOPL mechansim.

Provide a config option to switch IOPL to emulation mode, make it the
default and while at it also provide an option to disable IOPL completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>


# 90dc392f 13-Nov-2019 Christoph Hellwig <hch@lst.de>

x86: Remove the calgary IOMMU driver

The calgary IOMMU was only used on high-end IBM systems in the early
x86_64 age and has no known users left. Remove it to avoid having to
touch it for pending changes to the DMA API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191113071836.21041-2-hch@lst.de


# 562955fe 08-Nov-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace/x86: Add register_ftrace_direct() for custom trampolines

Enable x86 to allow for register_ftrace_direct(), where a custom trampoline
may be called directly from an ftrace mcount/fentry location.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# e380a039 07-Nov-2019 Nicolas Saenz Julienne <nsaenzjulienne@suse.de>

x86/PCI: sta2x11: use default DMA address translation

The devices found behind this PCIe chip have unusual DMA mapping
constraints as there is an AMBA interconnect placed in between them and
the different PCI endpoints. The offset between physical memory
addresses and AMBA's view is provided by reading a PCI config register,
which is saved and used whenever DMA mapping is needed.

It turns out that this DMA setup can be represented by properly setting
'dma_pfn_offset', 'dma_bus_mask' and 'dma_mask' during the PCI device
enable fixup. And ultimately allows us to get rid of this device's
custom DMA functions.

Aside from the code deletion and DMA setup, sta2x11_pdev_to_mapping() is
moved to avoid warnings whenever CONFIG_PM is not enabled.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# b971880f 05-Nov-2019 Babu Moger <Babu.Moger@amd.com>

x86/Kconfig: Rename UMIP config parameter

AMD 2nd generation EPYC processors support the UMIP (User-Mode
Instruction Prevention) feature. So, rename X86_INTEL_UMIP to
generic X86_UMIP and modify the text to cover both Intel and AMD.

[ bp: take of the disabled-features.h copy in tools/ too. ]

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "x86@kernel.org" <x86@kernel.org>
Link: https://lkml.kernel.org/r/157298912544.17462.2018334793891409521.stgit@naples-babu.amd.com


# db616173 22-Oct-2019 Michal Hocko <mhocko@suse.com>

x86/tsx: Add config options to set tsx=on|off|auto

There is a general consensus that TSX usage is not largely spread while
the history shows there is a non trivial space for side channel attacks
possible. Therefore the tsx is disabled by default even on platforms
that might have a safe implementation of TSX according to the current
knowledge. This is a fair trade off to make.

There are, however, workloads that really do benefit from using TSX and
updating to a newer kernel with TSX disabled might introduce a
noticeable regressions. This would be especially a problem for Linux
distributions which will provide TAA mitigations.

Introduce config options X86_INTEL_TSX_MODE_OFF, X86_INTEL_TSX_MODE_ON
and X86_INTEL_TSX_MODE_AUTO to control the TSX feature. The config
setting can be overridden by the tsx cmdline options.

[ bp: Text cleanups from Josh. ]

Suggested-by: Borislav Petkov <bpetkov@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>


# 1edae1ae 12-Oct-2019 Scott Wood <swood@redhat.com>

x86/Kconfig: Enforce limit of 512 CPUs with MAXSMP and no CPUMASK_OFFSTACK

The help text of NR_CPUS says that the maximum number of CPUs supported
without CPUMASK_OFFSTACK is 512. However, NR_CPUS_RANGE_END allows this
limit to be bypassed by MAXSMP even if CPUMASK_OFFSTACK is not set.

This scenario can currently only happen in the RT tree, since it has
"select CPUMASK_OFFSTACK if !PREEMPT_RT_FULL" in MAXSMP. However,
even if we ignore the RT tree, checking for MAXSMP in addition to
CPUMASK_OFFSTACK is redundant.

Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191012070054.28657-1-swood@redhat.com


# 87d6021b 01-Oct-2019 Arnd Bergmann <arnd@arndb.de>

x86/math-emu: Limit MATH_EMULATION to 486SX compatibles

The FPU emulation code is old and fragile in places, try to limit its
use to builds for CPUs that actually use it. As far as I can tell,
this is only true for i486sx compatibles, including the Cyrix 486SLC,
AMD Am486SX and ÉLAN SC410, UMC U5S amd DM&P VortexSX86, all of which
were relatively short-lived and got replaced with i486DX compatible
processors soon after introduction, though some of the embedded versions
remained available much longer.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Bill Metzenthen <billm@melbpc.org.au>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191001142344.1274185-2-arnd@arndb.de


# 18ec1eaf 12-Sep-2019 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm: Enable 5-level paging support by default

Support of boot-time switching between 4- and 5-level paging mode is
upstream since 4.17.

We run internal testing with 5-level paging support enabled for a while
and it doesn't not cause any functional or performance regression on
4-level paging hardware.

The only 5-level paging related regressions I saw were in early boot
code that runs independently from CONFIG_X86_5LEVEL.

The next major release of distributions expected to have
CONFIG_X86_5LEVEL=y.

Enable the option by default. It may help to catch obscure bugs early.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: https://lkml.kernel.org/r/20190913095452.40592-1-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2ff2b7ec 18-Aug-2019 Masahiro Yamada <yamada.masahiro@socionext.com>

kbuild: add CONFIG_ASM_MODVERSIONS

Add CONFIG_ASM_MODVERSIONS. This allows to remove one if-conditional
nesting in scripts/Makefile.build.

scripts/Makefile.build is run every time Kbuild descends into a
sub-directory. So, I want to avoid $(wildcard ...) evaluation
where possible although computing $(wildcard ...) is so cheap that
it may not make measurable performance difference.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>


# 99d5cadf 19-Aug-2019 Jiri Bohac <jbohac@suse.cz>

kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE

This is a preparatory patch for kexec_file_load() lockdown. A locked down
kernel needs to prevent unsigned kernel images from being loaded with
kexec_file_load(). Currently, the only way to force the signature
verification is compiling with KEXEC_VERIFY_SIG. This prevents loading
usigned images even when the kernel is not locked down at runtime.

This patch splits KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE.
Analogous to the MODULE_SIG and MODULE_SIG_FORCE for modules, KEXEC_SIG
turns on the signature verification but allows unsigned images to be
loaded. KEXEC_SIG_FORCE disallows images without a valid signature.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
cc: kexec@lists.infradead.org
Signed-off-by: James Morris <jmorris@namei.org>


# 2e1da13f 07-Aug-2019 Vlastimil Babka <vbabka@suse.cz>

x86/kconfig: Remove X86_DIRECT_GBPAGES dependency on !DEBUG_PAGEALLOC

These days CONFIG_DEBUG_PAGEALLOC just compiles in the code that has to be
enabled on boot time, or with an extra config option, and only then are the
large page based direct mappings disabled.

Therefore remove the config dependency, allowing 1GB direct mappings with
debug_pagealloc compiled in but not enabled.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190807130258.22185-1-vbabka@suse.cz


# 0c9c1d56 05-Aug-2019 Thiago Jung Bauermann <bauerman@linux.ibm.com>

x86, s390: Move ARCH_HAS_MEM_ENCRYPT definition to arch/Kconfig

powerpc is also going to use this feature, so put it in a generic location.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190806044919.10622-2-bauerman@linux.ibm.com


# a1c4423b 03-Jul-2019 Marcelo Tosatti <mtosatti@redhat.com>

cpuidle-haltpoll: disable host side polling when kvm virtualized

When performing guest side polling, it is not necessary to
also perform host side polling.

So disable host side polling, via the new MSR interface,
when loading cpuidle-haltpoll driver.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 17596731 16-Jul-2019 Robin Murphy <robin.murphy@arm.com>

mm: introduce ARCH_HAS_PTE_DEVMAP

ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
with the long-out-of-date comment can lead to the impression than an
architecture may just enable it (since __add_pages() now "comprehends
device memory" for itself) and expect things to work.

In practice, however, ZONE_DEVICE users have little chance of
functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
dependency so the real situation is clearer.

Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9087c375 10-Jul-2019 Tom Lendacky <thomas.lendacky@amd.com>

dma-direct: Force unencrypted DMA under SME for certain DMA masks

If a device doesn't support DMA to a physical address that includes the
encryption bit (currently bit 47, so 48-bit DMA), then the DMA must
occur to unencrypted memory. SWIOTLB is used to satisfy that requirement
if an IOMMU is not active (enabled or configured in passthrough mode).

However, commit fafadcd16595 ("swiotlb: don't dip into swiotlb pool for
coherent allocations") modified the coherent allocation support in
SWIOTLB to use the DMA direct coherent allocation support. When an IOMMU
is not active, this resulted in dma_alloc_coherent() failing for devices
that didn't support DMA addresses that included the encryption bit.

Addressing this requires changes to the force_dma_unencrypted() function
in kernel/dma/direct.c. Since the function is now non-trivial and
SME/SEV specific, update the DMA direct support to add an arch override
for the force_dma_unencrypted() function. The arch override is selected
when CONFIG_AMD_MEM_ENCRYPT is set. The arch override function resides in
the arch/x86/mm/mem_encrypt.c file and forces unencrypted DMA when either
SEV is active or SME is active and the device does not support DMA to
physical addresses that include the encryption bit.

Fixes: fafadcd16595 ("swiotlb: don't dip into swiotlb pool for coherent allocations")
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
[hch: moved the force_dma_unencrypted declaration to dma-mapping.h,
fold the s390 fix from Halil Pasic]
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 4f4cfa6c 27-Jun-2019 Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

docs: admin-guide: add a series of orphaned documents

There are lots of documents that belong to the admin-guide but
are on random places (most under Documentation root dir).

Move them to the admin guide.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>


# 330d4810 13-Jun-2019 Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

docs: admin-guide: add kdump documentation into it

The Kdump documentation describes procedures with admins use
in order to solve issues on their systems.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>


# 67a929e0 11-Jul-2019 Christoph Hellwig <hch@lst.de>

mm: rename CONFIG_HAVE_GENERIC_GUP to CONFIG_HAVE_FAST_GUP

We only support the generic GUP now, so rename the config option to
be more clear, and always use the mm/Kconfig definition of the
symbol and select it from the arch Kconfigs.

Link: http://lkml.kernel.org/r/20190625143715.1689-11-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: James Hogan <jhogan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 39656e83 11-Jul-2019 Christoph Hellwig <hch@lst.de>

mm: lift the x86_32 PAE version of gup_get_pte to common code

The split low/high access is the only non-READ_ONCE version of gup_get_pte
that did show up in the various arch implemenations. Lift it to common
code and drop the ifdef based arch override.

Link: http://lkml.kernel.org/r/20190625143715.1689-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: James Hogan <jhogan@kernel.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3876d4a3 27-Jun-2019 Alexandre Ghiti <alex@ghiti.fr>

x86, arm64: Move ARCH_WANT_HUGE_PMD_SHARE config in arch/Kconfig

ARCH_WANT_HUGE_PMD_SHARE config was declared in both architectures:
move this declaration in arch/Kconfig and make those architectures
select it.

Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> # for arm64
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>


# 625b7b7f 26-Jun-2019 Andy Lutomirski <luto@kernel.org>

x86/vsyscall: Change the default vsyscall mode to xonly

The use case for full emulation over xonly is very esoteric, e.g. magic
instrumentation tools.

Change the default to the safer xonly mode.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/30539f8072d2376b9c9efcc07e6ed0d6bf20e882.1561610354.git.luto@kernel.org


# bd49e16e 26-Jun-2019 Andy Lutomirski <luto@kernel.org>

x86/vsyscall: Add a new vsyscall=xonly mode

With vsyscall emulation on, a readable vsyscall page is still exposed that
contains syscall instructions that validly implement the vsyscalls.

This is required because certain dynamic binary instrumentation tools
attempt to read the call targets of call instructions in the instrumented
code. If the instrumented code uses vsyscalls, then the vsyscall page needs
to contain readable code.

Unfortunately, leaving readable memory at a deterministic address can be
used to help various ASLR bypasses, so some hardening value can be gained
by disallowing vsyscall reads.

Given how rarely the vsyscall page needs to be readable, add a mechanism to
make the vsyscall page be execute only.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/d17655777c21bc09a7af1bbcf74e6f2b69a51152.1561610354.git.luto@kernel.org


# 7ac87074 21-Jun-2019 Vincenzo Frascino <vincenzo.frascino@arm.com>

x86/vdso: Switch to generic vDSO implementation

The x86 vDSO library requires some adaptations to take advantage of the
newly introduced generic vDSO library.

Introduce the following changes:
- Modification of vdso.c to be compliant with the common vdso datapage
- Use of lib/vdso for gettimeofday

[ tglx: Massaged changelog and cleaned up the function signature formatting ]

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huw Davies <huw@codeweavers.com>
Cc: Shijith Thotton <sthotton@marvell.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Link: https://lkml.kernel.org/r/20190621095252.32307-23-vincenzo.frascino@arm.com


# 151f4e2b 13-Jun-2019 Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

docs: power: convert docs to ReST and rename to *.rst

Convert the PM documents to ReST, in order to allow them to
build with Sphinx.

The conversion is actually:
- add blank lines and indentation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>


# d67297ad 12-Jun-2019 Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

docs: kdump: convert docs to ReST and rename to *.rst

Convert kdump documentation to ReST and add it to the
user faced manual, as the documents are mainly focused on
sysadmins that would be enabling kdump.

Note: the vmcoreinfo.rst has one very long title on one of its
sub-sections:

PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask|PAGE_BUDDY_MAPCOUNT_VALUE(~PG_buddy)|PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline)

I opted to break this one, into two entries with the same content,
in order to make it easier to display after being parsed in html and PDF.

The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>


# 0c608dad 05-Jun-2019 Aubrey Li <aubrey.li@linux.intel.com>

x86/process: Add AVX-512 usage elapsed time to /proc/pid/arch_status

AVX-512 components usage can result in turbo frequency drop. So it's useful
to expose AVX-512 usage elapsed time as a heuristic hint for user space job
schedulers to cluster the AVX-512 using tasks together.

Examples:
$ while [ 1 ]; do cat /proc/tid/arch_status | grep AVX512; sleep 1; done
AVX512_elapsed_ms: 4
AVX512_elapsed_ms: 8
AVX512_elapsed_ms: 4

This means that 4 milliseconds have elapsed since the tsks AVX512 usage was
detected when the task was scheduled out.

$ cat /proc/tid/arch_status | grep AVX512
AVX512_elapsed_ms: -1

'-1' indicates that no AVX512 usage was recorded for this task.

The time exposed is not necessarily accurate when the arch_status file is
read as the AVX512 usage is only evaluated when a task is scheduled
out. Accurate usage information can be obtained with performance counters.

[ tglx: Massaged changelog ]

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: peterz@infradead.org
Cc: hpa@zytor.com
Cc: ak@linux.intel.com
Cc: tim.c.chen@linux.intel.com
Cc: dave.hansen@intel.com
Cc: arjan@linux.intel.com
Cc: adobriyan@gmail.com
Cc: aubrey.li@intel.com
Cc: linux-api@vger.kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linux API <linux-api@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190606012236.9391-2-aubrey.li@linux.intel.com


# 498ad393 29-Apr-2019 Zhao Yakui <yakui.zhao@intel.com>

x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector

Use the HYPERVISOR_CALLBACK_VECTOR to notify an ACRN guest.

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1559108037-18813-4-git-send-email-yakui.zhao@intel.com


# ec7972c9 29-Apr-2019 Zhao Yakui <yakui.zhao@intel.com>

x86: Add support for Linux guests on an ACRN hypervisor

ACRN is an open-source hypervisor maintained by The Linux Foundation. It
is built for embedded IOT with small footprint and real-time features.
Add ACRN guest support so that it allows Linux to be booted under the
ACRN hypervisor. This adds only the barebones implementation.

[ bp: Massage commit message and help text. ]

Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1559108037-18813-3-git-send-email-yakui.zhao@intel.com


# ecca2502 29-Apr-2019 Zhao Yakui <yakui.zhao@intel.com>

x86/Kconfig: Add new X86_HV_CALLBACK_VECTOR config symbol

Add a special Kconfig symbol X86_HV_CALLBACK_VECTOR so that the guests
using the hypervisor interrupt callback counter can select and thus
enable that counter. Select it when xen or hyperv support is enabled. No
functional changes.

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: linux-hyperv@vger.kernel.org
Cc: Nicolai Stange <nstange@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/1559108037-18813-2-git-send-email-yakui.zhao@intel.com


# cb1aaebe 07-Jun-2019 Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

docs: fix broken documentation links

Mostly due to x86 and acpi conversion, several documentation
links are still pointing to the old file. Fix them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: Wolfram Sang <wsa@the-dreams.de>
Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>


# 1eecbcdc 07-Jun-2019 Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

docs: move protection-keys.rst to the core-api book

This document is used by multiple architectures:

$ echo $(git grep -l pkey_mprotect arch|cut -d'/' -f 2|sort|uniq)
alpha arm arm64 ia64 m68k microblaze mips parisc powerpc s390 sh sparc x86 xtensa

So, let's move it to the core book and adjust the links to it
accordingly.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>


# 0c3d931b 13-May-2019 Lubomir Rintel <lkundrak@v3.sk>

Platform: OLPC: Add XO-1.75 EC driver

It's based off the driver from the OLPC kernel sources. Somewhat
modernized and cleaned up, for better or worse.

Modified to plug into the olpc-ec driver infrastructure (so that battery
interface and debugfs could be reused) and the SPI slave framework.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>


# 9012d011 14-May-2019 Masahiro Yamada <yamada.masahiro@socionext.com>

compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING

Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.

The idea is obviously arch-agnostic. This commit moves the config entry
from arch/x86/Kconfig.debug to lib/Kconfig.debug so that all
architectures can benefit from it.

This can make a huge difference in kernel image size especially when
CONFIG_OPTIMIZE_FOR_SIZE is enabled.

For example, I got 3.5% smaller arm64 kernel for v5.1-rc1.

dec file
18983424 arch/arm64/boot/Image.before
18321920 arch/arm64/boot/Image.after

This also slightly improves the "Kernel hacking" Kconfig menu as
e61aca5158a8 ("Merge branch 'kconfig-diet' from Dave Hansen') suggested;
this config option would be a good fit in the "compiler option" menu.

Link: http://lkml.kernel.org/r/20190423034959.13525-12-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Brezillon <bbrezillon@kernel.org>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 350e88ba 13-May-2019 Mike Rapoport <rppt@kernel.org>

mm: memblock: make keeping memblock memory opt-in rather than opt-out

Most architectures do not need the memblock memory after the page
allocator is initialized, but only few enable ARCH_DISCARD_MEMBLOCK in the
arch Kconfig.

Replacing ARCH_DISCARD_MEMBLOCK with ARCH_KEEP_MEMBLOCK and inverting the
logic makes it clear which architectures actually use memblock after
system initialization and skips the necessity to add ARCH_DISCARD_MEMBLOCK
to the architectures that are still missing that option.

Link: http://lkml.kernel.org/r/1556102150-32517-1-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4eb0716e 13-May-2019 Alexandre Ghiti <alex@ghiti.fr>

hugetlb: allow to free gigantic pages regardless of the configuration

On systems without CONTIG_ALLOC activated but that support gigantic pages,
boottime reserved gigantic pages can not be freed at all. This patch
simply enables the possibility to hand back those pages to memory
allocator.

Link: http://lkml.kernel.org/r/20190327063626.18421-5-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Acked-by: David S. Miller <davem@davemloft.net> [sparc]
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8df995f6 13-May-2019 Alexandre Ghiti <alex@ghiti.fr>

mm: simplify MEMORY_ISOLATION && COMPACTION || CMA into CONTIG_ALLOC

This condition allows to define alloc_contig_range, so simplify it into a
more accurate naming.

Link: http://lkml.kernel.org/r/20190327063626.18421-4-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 409ca455 12-May-2019 Masahiro Yamada <yamada.masahiro@socionext.com>

x86/kconfig: Disable CONFIG_GENERIC_HWEIGHT and remove __HAVE_ARCH_SW_HWEIGHT

Remove an unnecessary arch complication:

arch/x86/include/asm/arch_hweight.h uses __sw_hweight{32,64} as
alternatives, and they are implemented in arch/x86/lib/hweight.S

x86 does not rely on the generic C implementation lib/hweight.c
at all, so CONFIG_GENERIC_HWEIGHT should be disabled.

__HAVE_ARCH_SW_HWEIGHT is not necessary either.

No change in functionality intended.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: http://lkml.kernel.org/r/1557665521-17570-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 518049d9 09-May-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace/x86_32: Remove support for non DYNAMIC_FTRACE

When DYNAMIC_FTRACE is enabled in the kernel, all the functions that can be
traced by the function tracer have a "nop" placeholder at the start of the
function. When function tracing is enabled, the nop is converted into a call
to the tracing infrastructure where the functions get traced. This also
allows for specifying specific functions to trace, and a lot of
infrastructure is built on top of this.

When DYNAMIC_FTRACE is not enabled, all the functions have a call to the
ftrace trampoline. A check is made to see if a function pointer is the
ftrace_stub or not, and if it is not, it calls the function pointer to trace
the code. This adds over 10% overhead to the kernel even when tracing is
disabled.

When an architecture supports DYNAMIC_FTRACE there really is no reason to
use the static tracing. I have kept non DYNAMIC_FTRACE available in x86 so
that the generic code for non DYNAMIC_FTRACE can be tested. There is no
reason to support non DYNAMIC_FTRACE for both x86_64 and x86_32. As the non
DYNAMIC_FTRACE for x86_32 does not even support fentry, and we want to
remove mcount completely, there's no reason to keep non DYNAMIC_FTRACE
around for x86_32.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# d253ca0c 25-Apr-2019 Rick Edgecombe <rick.p.edgecombe@intel.com>

x86/mm/cpa: Add set_direct_map_*() functions

Add two new functions set_direct_map_default_noflush() and
set_direct_map_invalid_noflush() for setting the direct map alias for the
page to its default valid permissions and to an invalid state that cannot
be cached in a TLB, respectively. These functions do not flush the TLB.

Note, __kernel_map_pages() does something similar but flushes the TLB and
doesn't reset the permission bits to default on all architectures.

Also add an ARCH config ARCH_HAS_SET_DIRECT_MAP for specifying whether
these have an actual implementation or a default empty one.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-15-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 3599fe12 25-Apr-2019 Thomas Gleixner <tglx@linutronix.de>

x86/stacktrace: Use common infrastructure

Replace the stack_trace_save*() functions with the new arch_stack_walk()
interfaces.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: linux-arch@vger.kernel.org
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
Cc: David Rientjes <rientjes@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: dm-devel@redhat.com
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20190425094803.816485461@linutronix.de


# 2792107d 24-Apr-2019 Mike Rapoport <rppt@kernel.org>

x86/Kconfig: Deprecate DISCONTIGMEM support for 32-bit x86

Mel Gorman says:

"32-bit NUMA systems should be non-existent in practice. The last NUMA
system I'm aware of that was both NUMA and 32-bit only died somewhere
between 2004 and 2007. If someone is running a 64-bit capable system in
32-bit mode with NUMA, they really are just punishing themselves for fun."

Mark DISCONTIGMEM broken for now as suggested by Christoph Hellwig,
and (hopefully) remove it in a couple of releases.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1556112252-9339-3-git-send-email-rppt@linux.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6ad57f7f 24-Apr-2019 Mike Rapoport <rppt@kernel.org>

x86/Kconfig: Make SPARSEMEM default for 32-bit x86

Sparsemem has been a default memory model for x86-64 for over a decade,
since:

b263295dbffd ("x86: 64-bit, make sparsemem vmemmap the only memory model").

Make it the default for 32-bit NUMA systems (if there any left) as well.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1556112252-9339-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 15854edd 10-Apr-2019 Christoph Hellwig <hch@lst.de>

x86/pci: Clean up usage of X86_DEV_DMA_OPS

We have supported per-device dma_map_ops in generic code for a long
time, and this symbol just guards the inclusion of the dma_map_ops
registry used for vmd. Stop enabling it for anything but vmd.

No change in functionality intended.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190410080220.21705-3-hch@lst.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 5dd50aae 05-Nov-2018 David Howells <dhowells@redhat.com>

Make anon_inodes unconditional

Make the anon_inodes facility unconditional so that it can be used by core
VFS code and pidfd code.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[christian@brauner.io: adapt commit message to mention pidfds]
Signed-off-by: Christian Brauner <christian@brauner.io>


# 117ed454 14-Apr-2019 Thomas Gleixner <tglx@linutronix.de>

x86/irq/64: Remove stack overflow debug code

All stack types on x86 64-bit have guard pages now.

So there is no point in executing probabilistic overflow checks as the
guard pages are a accurate and reliable overflow prevention.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160146.466354762@linutronix.de


# a943245a 16-Apr-2019 Colin Ian King <colin.king@canonical.com>

x86/Kconfig: Fix spelling mistake "effectivness" -> "effectiveness"

The Kconfig text contains a spelling mistake, fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/20190416105751.18899-1-colin.king@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# c02f48e0 04-Apr-2019 Borislav Petkov <bp@suse.de>

x86/microcode: Deprecate MICROCODE_OLD_INTERFACE

This is the ancient loading interface which requires special tools to be
used. The bigger problem with it is that it is as inadequate for proper
loading of microcode as the late microcode loading interface is because
it happens just as late.

iucode_tool's manpage already warns people that it is deprecated.

Deprecate it and turn it off by default along with a big fat warning in
the Kconfig help. It will go away sometime in the future.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190405133010.24249-4-bp@alien8.de


# fb346fd9 04-Apr-2019 Waiman Long <longman@redhat.com>

locking/lock_events: Make lock_events available for all archs & other locks

The QUEUED_LOCK_STAT option to report queued spinlocks event counts
was previously allowed only on x86 architecture. To make the locking
event counting code more useful, it is now renamed to a more generic
LOCK_EVENT_COUNTS config option. This new option will be available to
all the architectures that use qspinlock at the moment.

Other locking code can now start to use the generic locking event
counting code by including lock_events.h and put the new locking event
names into the lock_events_list.h header file.

My experience with lock event counting is that it gives valuable insight
on how the locking code works and what can be done to make it better. I
would like to extend this benefit to other locking code like mutex and
rwsem in the near future.

The PV qspinlock specific code will stay in qspinlock_stat.h. The
locking event counters will now reside in the <debugfs>/lock_event_counts
directory.

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20190404174320.22416-9-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# a5881bea 10-Apr-2019 Christoph Hellwig <hch@lst.de>

x86/Kconfig: Remove the unused X86_DMA_REMAP KConfig symbol

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190410080220.21705-2-hch@lst.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 390a0c62 22-Mar-2019 Waiman Long <longman@redhat.com>

locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archs

Currently, we have two different implementation of rwsem:

1) CONFIG_RWSEM_GENERIC_SPINLOCK (rwsem-spinlock.c)
2) CONFIG_RWSEM_XCHGADD_ALGORITHM (rwsem-xadd.c)

As we are going to use a single generic implementation for rwsem-xadd.c
and no architecture-specific code will be needed, there is no point
in keeping two different implementations of rwsem. In most cases, the
performance of rwsem-spinlock.c will be worse. It also doesn't get all
the performance tuning and optimizations that had been implemented in
rwsem-xadd.c over the years.

For simplication, we are going to remove rwsem-spinlock.c and make all
architectures use a single implementation of rwsem - rwsem-xadd.c.

All references to RWSEM_GENERIC_SPINLOCK and RWSEM_XCHGADD_ALGORITHM
in the code are removed.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-um@lists.infradead.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: nios2-dev@lists.rocketboards.org
Cc: openrisc@lists.librecores.org
Cc: uclinux-h8-devel@lists.sourceforge.jp
Link: https://lkml.kernel.org/r/20190322143008.21313-3-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 96bc9567 19-Sep-2018 Peter Zijlstra <peterz@infradead.org>

asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE

Make issuing a TLB invalidate for page-table pages the normal case.

The reason is twofold:

- too many invalidates is safer than too few,
- most architectures use the linux page-tables natively
and would thus require this.

Make it an opt-out, instead of an opt-in.

No change in behavior intended.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# bebd024e 26-Mar-2019 Thomas Gleixner <tglx@linutronix.de>

x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y

The SMT disable 'nosmt' command line argument is not working properly when
CONFIG_HOTPLUG_CPU is disabled. The teardown of the sibling CPUs which are
required to be brought up due to the MCE issues, cannot work. The CPUs are
then kept in a half dead state.

As the 'nosmt' functionality has become popular due to the speculative
hardware vulnerabilities, the half torn down state is not a proper solution
to the problem.

Enforce CONFIG_HOTPLUG_CPU=y when SMP is enabled so the full operation is
possible.

Reported-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mukesh Ojha <mojha@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190326163811.598166056@linutronix.de


# dadd2299 05-Nov-2018 David Howells <dhowells@redhat.com>

Make anon_inodes unconditional

Make the anon_inodes facility unconditional so that it can be used by core
VFS code.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# eac61655 05-Mar-2019 Borislav Petkov <bp@suse.de>

x86: Deprecate a.out support

Linux supports ELF binaries for ~25 years now. a.out coredumping has
bitrotten quite significantly and would need some fixing to get it into
shape again but considering how even the toolchains cannot create a.out
executables in its default configuration, let's deprecate a.out support
and remove it a couple of releases later, instead.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Jann Horn <jannh@google.com>
Cc: <linux-api@vger.kernel.org>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <x86@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ff4c25f2 03-Feb-2019 Christoph Hellwig <hch@lst.de>

dma-mapping: improve selection of dma_declare_coherent availability

This API is primarily used through DT entries, but two architectures
and two drivers call it directly. So instead of selecting the config
symbol for random architectures pull it in implicitly for the actual
users. Also rename the Kconfig option to describe the feature better.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Burton <paul.burton@mips.com> # MIPS
Acked-by: Lee Jones <lee.jones@linaro.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 942fa985 16-May-2018 Yury Norov <ynorov@caviumnetworks.com>

32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option

All new 32-bit architectures should have 64-bit userspace off_t type, but
existing architectures has 32-bit ones.

To enforce the rule, new config option is added to arch/Kconfig that defaults
ARCH_32BIT_OFF_T to be disabled for new 32-bit architectures. All existing
32-bit architectures enable it explicitly.

New option affects force_o_largefile() behaviour. Namely, if userspace
off_t is 64-bits long, we have no reason to reject user to open big files.

Note that even if architectures has only 64-bit off_t in the kernel
(arc, c6x, h8300, hexagon, nios2, openrisc, and unicore32),
a libc may use 32-bit off_t, and therefore want to limit the file size
to 4GB unless specified differently in the open flags.

Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yury Norov <ynorov@marvell.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# ce9084ba 02-Feb-2019 Ard Biesheuvel <ardb@kernel.org>

x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbol

Turn ARCH_USE_MEMREMAP_PROT into a generic Kconfig symbol, and fix the
dependency expression to reflect that AMD_MEM_ENCRYPT depends on it,
instead of the other way around. This will permit ARCH_USE_MEMREMAP_PROT
to be selected by other architectures.

Note that the encryption related early memremap routines in
arch/x86/mm/ioremap.c cannot be built for 32-bit x86 without triggering
the following warning:

arch/x86//mm/ioremap.c: In function 'early_memremap_encrypted':
>> arch/x86/include/asm/pgtable_types.h:193:27: warning: conversion from
'long long unsigned int' to 'long unsigned int' changes
value from '9223372036854776163' to '355' [-Woverflow]
#define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _PAGE_ENC)
^~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86//mm/ioremap.c:713:46: note: in expansion of macro '__PAGE_KERNEL_ENC'
return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC);

which essentially means they are 64-bit only anyway. However, we cannot
make them dependent on CONFIG_ARCH_HAS_MEM_ENCRYPT, since that is always
defined, even for i386 (and changing that results in a slew of build errors)

So instead, build those routines only if CONFIG_AMD_MEM_ENCRYPT is
defined.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Alexander Graf <agraf@suse.de>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20190202094119.13230-9-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# e6d42931 29-Jan-2019 Johannes Weiner <hannes@cmpxchg.org>

x86/resctrl: Avoid confusion over the new X86_RESCTRL config

"Resource Control" is a very broad term for this CPU feature, and a term
that is also associated with containers, cgroups etc. This can easily
cause confusion.

Make the user prompt more specific. Match the config symbol name.

[ bp: In the future, the corresponding ARM arch-specific code will be
under ARM_CPU_RESCTRL and the arch-agnostic bits will be carved out
under the CPU_RESCTRL umbrella symbol. ]

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Babu Moger <Babu.Moger@amd.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-doc@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190130195621.GA30653@cmpxchg.org


# 625210cf 21-Jan-2019 Sinan Kaya <okaya@kernel.org>

x86/Kconfig: Select PCI_LOCKLESS_CONFIG if PCI is enabled

After commit

5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")

dependencies on CONFIG_PCI that previously were satisfied implicitly
through dependencies on CONFIG_ACPI have to be specified directly.

PCI_LOCKLESS_CONFIG depends on PCI but this dependency has not been
mentioned in the Kconfig so add an explicit dependency here and fix

WARNING: unmet direct dependencies detected for PCI_LOCKLESS_CONFIG
Depends on [n]: PCI [=n]
Selected by [y]:
- X86 [=y]

Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-acpi@vger.kernel.org
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190121231958.28255-2-okaya@kernel.org


# e9820d6b 05-Jan-2019 Sinan Kaya <okaya@kernel.org>

x86/intel/lpss: Make PCI dependency explicit

After commit 5d32a66541c4 (PCI/ACPI: Allow ACPI to be built without
CONFIG_PCI set) dependencies on CONFIG_PCI that previously were
satisfied implicitly through dependencies on CONFIG_ACPI have to be
specified directly.

LPSS code relies on PCI infrastructure but this dependency has not
been called out explicitly yet.

Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5962dd22 02-Jan-2019 Sinan Kaya <okaya@kernel.org>

x86/intel/lpss: Make PCI dependency explicit

After commit

5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")

dependencies on CONFIG_PCI that previously were satisfied implicitly
through dependencies on CONFIG_ACPI have to be specified directly. LPSS
code relies on PCI infrastructure but this dependency has not been
explicitly called out so do that.

Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-acpi@vger.kernel.org
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190102181038.4418-11-okaya@kernel.org


# 90802938 08-Jan-2019 Borislav Petkov <bp@suse.de>

x86/cache: Rename config option to CONFIG_X86_RESCTRL

CONFIG_RESCTRL is too generic. The final goal is to have a generic
option called like this which is selected by the arch-specific ones
CONFIG_X86_RESCTRL and CONFIG_ARM64_RESCTRL. The generic one will
cover the resctrl filesystem and other generic and shared bits of
functionality.

Signed-off-by: Borislav Petkov <bp@suse.de>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20190108171401.GC12235@zn.tnic


# 9f132f7e 03-Jan-2019 Joel Fernandes (Google) <joel@joelfernandes.org>

mm: select HAVE_MOVE_PMD on x86 for faster mremap

Moving page-tables at the PMD-level on x86 is known to be safe. Enable
this option so that we can do fast mremap when possible.

Link: http://lkml.kernel.org/r/20181108181201.88826-4-joelaf@google.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8636a1f9 11-Dec-2018 Masahiro Yamada <yamada.masahiro@socionext.com>

treewide: surround Kconfig file paths with double quotes

The Kconfig lexer supports special characters such as '.' and '/' in
the parameter context. In my understanding, the reason is just to
support bare file paths in the source statement.

I do not see a good reason to complicate Kconfig for the room of
ambiguity.

The majority of code already surrounds file paths with double quotes,
and it makes sense since file paths are constant string literals.

Make it treewide consistent now.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@kernel.org>


# 3731c3d4 06-Dec-2018 Christoph Hellwig <hch@lst.de>

dma-mapping: always build the direct mapping code

All architectures except for sparc64 use the dma-direct code in some
form, and even for sparc64 we had the discussion of a direct mapping
mode a while ago. In preparation for directly calling the direct
mapping code don't bother having it optionally but always build the
code in. This is a minor hardship for some powerpc and arm configs
that don't pull it in yet (although they should in a relase ot two),
and sparc64 which currently doesn't need it at all, but it will
reduce the ifdef mess we'd otherwise need significantly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>


# 7733607f 10-Dec-2018 Maran Wilson <maran.wilson@oracle.com>

xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH

In order to pave the way for hypervisors other than Xen to use the PVH
entry point for VMs, we need to factor the PVH entry code into Xen specific
and hypervisor agnostic components. The first step in doing that, is to
create a new config option for PVH entry that can be enabled
independently from CONFIG_XEN.

Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>


# 7c703e54 09-Nov-2018 Christoph Hellwig <hch@lst.de>

arch: switch the default on ARCH_HAS_SG_CHAIN

These days architectures are mostly out of the business of dealing with
struct scatterlist at all, unless they have architecture specific iommu
drivers. Replace the ARCH_HAS_SG_CHAIN symbol with a ARCH_NO_SG_CHAIN
one only enabled for architectures with horrible legacy iommu drivers
like alpha and parisc, and conditionally for arm which wants to keep it
disable for legacy platforms.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>


# dbe73364 25-Nov-2018 Thomas Gleixner <tglx@linutronix.de>

x86/Kconfig: Select SCHED_SMT if SMP enabled

CONFIG_SCHED_SMT is enabled by all distros, so there is not a real point to
have it configurable. The runtime overhead in the core scheduler code is
minimal because the actual SMT scheduling parts are conditional on a static
key.

This allows to expose the scheduler's SMT state static key to the
speculation control code. Alternatively the scheduler's static key could be
made always available when CONFIG_SMP is enabled, but that's just adding an
unused static key to every other architecture for nothing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.337452245@linutronix.de


# 4cd24de3 02-Nov-2018 Zhenzhong Duan <zhenzhong.duan@oracle.com>

x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support

Since retpoline capable compilers are widely available, make
CONFIG_RETPOLINE hard depend on the compiler capability.

Break the build when CONFIG_RETPOLINE is enabled and the compiler does not
support it. Emit an error message in that case:

"arch/x86/Makefile:226: *** You are building kernel with non-retpoline
compiler, please update your compiler.. Stop."

[dwmw: Fail the build with non-retpoline compiler]

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Borislav Petkov <bp@suse.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: <srinivas.eeda@oracle.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/cca0cb20-f9e2-4094-840b-fb0f8810cd34@default


# 6630a8e5 15-Nov-2018 Christoph Hellwig <hch@lst.de>

eisa: consolidate EISA Kconfig entry in drivers/eisa

Let architectures opt into EISA support by selecting HAVE_EISA and
handle everything else in drivers/eisa.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


# 1753d50c 15-Nov-2018 Christoph Hellwig <hch@lst.de>

rapidio: consolidate RAPIDIO config entry in drivers/rapidio

There is no good reason to duplicate the RAPIDIO menu in various
architectures. Instead provide a selectable HAVE_RAPIDIO symbol
that indicates native availability of RAPIDIO support and the handle
the rest in drivers/pci. This also means we now provide support
for PCI(e) to Rapidio bridges for every architecture instead of a
limited subset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


# 8fb71ef9 15-Nov-2018 Christoph Hellwig <hch@lst.de>

pcmcia: allow PCMCIA support independent of the architecture

There is nothing architecture specific in the PCMCIA core, so allow
building it everywhere. The actual host controllers will depend on ISA,
PCI or a specific SOC.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


# 2eac9c2d 15-Nov-2018 Christoph Hellwig <hch@lst.de>

PCI: consolidate the PCI_DOMAINS and PCI_DOMAINS_GENERIC config options

Move the definitions to drivers/pci and let the architectures select
them. Two small differences to before: PCI_DOMAINS_GENERIC now selects
PCI_DOMAINS, cutting down the churn for modern architectures. As the
only architectured arm did previously also offer PCI_DOMAINS as a user
visible choice in addition to selecting it from the relevant configs,
this is gone now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


# eb01d42a 15-Nov-2018 Christoph Hellwig <hch@lst.de>

PCI: consolidate PCI config entry in drivers/pci

There is no good reason to duplicate the PCI menu in every architecture.
Instead provide a selectable HAVE_PCI symbol that indicates availability
of PCI support, and a FORCE_PCI symbol to for PCI on and the handle the
rest in drivers/pci.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


# 6fe07ce3 21-Nov-2018 Babu Moger <Babu.Moger@amd.com>

x86/resctrl: Rename the config option INTEL_RDT to RESCTRL

The resource control feature is supported by both Intel and AMD. So,
rename CONFIG_INTEL_RDT to the vendor-neutral CONFIG_RESCTRL.

Now CONFIG_RESCTRL will be used for both Intel and AMD to enable
Resource Control support. Update the texts in config and condition
accordingly.

[ bp: Simplify Kconfig text. ]

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: <qianyue.zj@alibaba-inc.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Rian Hunter <rian@alum.mit.edu>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: <xiaochen.shen@intel.com>
Link: https://lkml.kernel.org/r/20181121202811.4492-9-babu.moger@amd.com


# a48777fd 05-Nov-2018 Eial Czerwacki <eial@scalemp.com>

x86/vsmp: Remove dependency on pv_irq_ops

vSMP dependency on pv_irq_ops has been removed some years ago, but the code
still deals with pv_irq_ops.

In short, "cap & ctl & (1 << 4)" is always returning 0, so all
PARAVIRT/PARAVIRT_XXL code related to that can be removed.

However, the rest of the code depends on CONFIG_PCI, so fix it accordingly.

Rename set_vsmp_pv_ops to set_vsmp_ctl as the original name does not make
sense anymore.

Signed-off-by: Eial Czerwacki <eial@scalemp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Shai Fultheim <shai@scalemp.com>
Cc: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/1541439114-28297-1-git-send-email-eial@scalemp.com


# aca52c39 30-Oct-2018 Mike Rapoport <rppt@linux.vnet.ibm.com>

mm: remove CONFIG_HAVE_MEMBLOCK

All architecures use memblock for early memory management. There is no need
for the CONFIG_HAVE_MEMBLOCK configuration option.

[rppt@linux.vnet.ibm.com: of/fdt: fixup #ifdefs]
Link: http://lkml.kernel.org/r/20180919103457.GA20545@rapoport-lnx
[rppt@linux.vnet.ibm.com: csky: fixups after bootmem removal]
Link: http://lkml.kernel.org/r/20180926112744.GC4628@rapoport-lnx
[rppt@linux.vnet.ibm.com: remove stale #else and the code it protects]
Link: http://lkml.kernel.org/r/1538067825-24835-1-git-send-email-rppt@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1536927045-23536-4-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b4a991ec 30-Oct-2018 Mike Rapoport <rppt@linux.vnet.ibm.com>

mm: remove CONFIG_NO_BOOTMEM

All achitectures select NO_BOOTMEM which essentially becomes 'Y' for any
kernel configuration and therefore it can be removed.

[alexander.h.duyck@linux.intel.com: remove now defunct NO_BOOTMEM from depends list for deferred init]
Link: http://lkml.kernel.org/r/20180925201814.3576.15105.stgit@localhost.localdomain
Link: http://lkml.kernel.org/r/1536927045-23536-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b3569d3a 16-Oct-2018 Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>

x86/kconfig: Remove redundant 'default n' lines from all x86 Kconfig's

'default n' is the default value for any bool or tristate Kconfig
setting so there is no need to write it explicitly.

Also, since commit:

f467c5640c29 ("kconfig: only write '# CONFIG_FOO is not set' for visible symbols")

... the Kconfig behavior is the same regardless of 'default n' being present or not:

...
One side effect of (and the main motivation for) this change is making
the following two definitions behave exactly the same:

config FOO
bool

config FOO
bool
default n

With this change, neither of these will generate a
'# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
That might make it clearer to people that a bare 'default n' is
redundant.
...

Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Juergen Gross <jgross@suse.co>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20181016134217eucas1p2102984488b89178a865162553369025b%7EeGpI5NlJo0851008510eucas1p2D@eucas1p2.samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 3c88ee194c 25-Apr-2018 Masami Hiramatsu <mhiramat@kernel.org>

x86: ptrace: Add function argument access API

Add regs_get_argument() which returns N th argument of the
function call.
Note that this chooses most probably assignment, in some case
it can be incorrect (e.g. passing data structure or floating
point etc.)

This is expected to be called from kprobes or ftrace with regs
where the top of stack is the return address.

Link: http://lkml.kernel.org/r/152465885737.26224.2822487520472783854.stgit@devbox

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# fa112cf1 05-Oct-2018 Borislav Petkov <bp@suse.de>

x86/olpc: Fix build error with CONFIG_MFD_CS5535=m

When building a 32-bit config which has the above MFD item as module
but OLPC_XO1_PM is enabled =y - which is bool, btw - the kernel fails
building with:

ld: arch/x86/platform/olpc/olpc-xo1-pm.o: in function `xo1_pm_remove':
/home/boris/kernel/linux/arch/x86/platform/olpc/olpc-xo1-pm.c:159: undefined reference to `mfd_cell_disable'
ld: arch/x86/platform/olpc/olpc-xo1-pm.o: in function `xo1_pm_probe':
/home/boris/kernel/linux/arch/x86/platform/olpc/olpc-xo1-pm.c:133: undefined reference to `mfd_cell_enable'
make: *** [Makefile:1030: vmlinux] Error 1

Force MFD_CS5535 to y if OLPC_XO1_PM is enabled.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Lubomir Rintel <lkundrak@v3.sk>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20181005131750.GA5366@zn.tnic


# 2a21ad57 17-Sep-2018 Thomas Gleixner <tglx@linutronix.de>

x86/time: Implement clocksource_arch_init()

Runtime validate the VCLOCK_MODE in clocksource::archdata and disable
VCLOCK if invalid, which disables the VDSO but keeps the system running.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Rickard <matt@softrans.com.au>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: devel@linuxdriverproject.org
Cc: virtualization@lists.linux-foundation.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/20180917130707.069167446@linutronix.de


# 44556530 21-Sep-2018 Zhimin Gu <kookoo.gu@intel.com>

x86-32, hibernate: Enable CONFIG_ARCH_HIBERNATION_HEADER on 32bit system

Enable CONFIG_ARCH_HIBERNATION_HEADER for 32bit system so that

1. arch_hibernation_header_save/restore() are invoked across
hibernation on 32bit system.

2. The checksum handling as well as 'magic' number checking
for 32bit system are enabled.

Controlled by CONFIG_X86_64 in hibernate.c

Signed-off-by: Zhimin Gu <kookoo.gu@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5c280cf6 17-Sep-2018 Thomas Gleixner <tglx@linutronix.de>

x86/mm/cpa: Add large page preservation statistics

The large page preservation mechanism is just magic and provides no
information at all. Add optional statistic output in debugfs so the magic can
be evaluated. Defaults is off.

Output:

1G pages checked: 2
1G pages sameprot: 0
1G pages preserved: 0
2M pages checked: 540
2M pages sameprot: 466
2M pages preserved: 47
4K pages checked: 800770
4K pages set-checked: 7668

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Bin Yang <bin.yang@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Link: https://lkml.kernel.org/r/20180917143546.160867778@linutronix.de


# b34006c4 19-Sep-2018 Ard Biesheuvel <ardb@kernel.org>

x86/jump_table: Use relative references

Similar to the arm64 case, 64-bit x86 can benefit from using relative
references rather than absolute ones when emitting struct jump_entry
instances. Not only does this reduce the memory footprint of the entries
themselves by 33%, it also removes the need for carrying relocation
metadata on relocatable builds (i.e., for KASLR) which saves a fair
chunk of .init space as well (although the savings are not as dramatic
as on arm64)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jessica Yu <jeyu@kernel.org>
Link: https://lkml.kernel.org/r/20180919065144.25010-7-ard.biesheuvel@linaro.org


# afaef01c 16-Aug-2018 Alexander Popov <alex.popov@linux.com>

x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls

The STACKLEAK feature (initially developed by PaX Team) has the following
benefits:

1. Reduces the information that can be revealed through kernel stack leak
bugs. The idea of erasing the thread stack at the end of syscalls is
similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel
crypto, which all comply with FDP_RIP.2 (Full Residual Information
Protection) of the Common Criteria standard.

2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712,
CVE-2010-2963). That kind of bugs should be killed by improving C
compilers in future, which might take a long time.

This commit introduces the code filling the used part of the kernel
stack with a poison value before returning to userspace. Full
STACKLEAK feature also contains the gcc plugin which comes in a
separate commit.

The STACKLEAK feature is ported from grsecurity/PaX. More information at:
https://grsecurity.net/
https://pax.grsecurity.net/

This code is modified from Brad Spengler/PaX Team's code in the last
public patch of grsecurity/PaX based on our understanding of the code.
Changes or omissions from the original code are ours and don't reflect
the original grsecurity/PaX code.

Performance impact:

Hardware: Intel Core i7-4770, 16 GB RAM

Test #1: building the Linux kernel on a single core
0.91% slowdown

Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P
4.2% slowdown

So the STACKLEAK description in Kconfig includes: "The tradeoff is the
performance impact: on a single CPU system kernel compilation sees a 1%
slowdown, other systems and workloads may vary and you are advised to
test this feature on your expected workload before deploying it".

Signed-off-by: Alexander Popov <alex.popov@linux.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>


# c00a280a 28-Aug-2018 Juergen Gross <jgross@suse.com>

x86/paravirt: Introduce new config option PARAVIRT_XXL

A large amount of paravirt ops is used by Xen PV guests only. Add a new
config option PARAVIRT_XXL which is selected by XEN_PV. Later we can
put the Xen PV only paravirt ops under the PARAVIRT_XXL umbrella.

Since irq related paravirt ops are used only by VSMP and Xen PV, let
VSMP select PARAVIRT_XXL, too, in order to enable moving the irq ops
under PARAVIRT_XXL.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: virtualization@lists.linux-foundation.org
Cc: akataria@vmware.com
Cc: rusty@rustcorp.com.au
Cc: boris.ostrovsky@oracle.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20180828074026.820-11-jgross@suse.com


# e3a5dc08 25-Aug-2018 Nikolas Nyby <nikolas@gnu.org>

x86/Kconfig: Fix trivial typo

Fix a typo in the Kconfig help text: adverticed -> advertised.

Signed-off-by: Nikolas Nyby <nikolas@gnu.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Cc: tglx@linutronix.de
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20180825231054.23813-1-nikolas@gnu.org


# 48a8b97c 22-Aug-2018 Peter Zijlstra <peterz@infradead.org>

x86/mm: Only use tlb_remove_table() for paravirt

If we don't use paravirt; don't play unnecessary and complicated games
to free page-tables.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d86564a2 22-Aug-2018 Peter Zijlstra <peterz@infradead.org>

mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE

Jann reported that x86 was missing required TLB invalidates when he
hit the !*batch slow path in tlb_remove_table().

This is indeed the case; RCU_TABLE_FREE does not provide TLB (cache)
invalidates, the PowerPC-hash where this code originated and the
Sparc-hash where this was subsequently used did not need that. ARM
which later used this put an explicit TLB invalidate in their
__p*_free_tlb() functions, and PowerPC-radix followed that example.

But when we hooked up x86 we failed to consider this. Fix this by
(optionally) hooking tlb_remove_table() into the TLB invalidate code.

NOTE: s390 was also needing something like this and might now
be able to use the generic code again.

[ Modified to be on top of Nick's cleanups, which simplified this patch
now that tlb_flush_mmu_tlbonly() really only flushes the TLB - Linus ]

Fixes: 9e52fc2b50de ("x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 271ca788 21-Aug-2018 Ard Biesheuvel <ardb@kernel.org>

arch: enable relative relocations for arm64, power and x86

Patch series "add support for relative references in special sections", v10.

This adds support for emitting special sections such as initcall arrays,
PCI fixups and tracepoints as relative references rather than absolute
references. This reduces the size by 50% on 64-bit architectures, but
more importantly, it removes the need for carrying relocation metadata for
these sections in relocatable kernels (e.g., for KASLR) that needs to be
fixed up at boot time. On arm64, this reduces the vmlinux footprint of
such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs
4 byte relative reference)

Patch #3 was sent out before as a single patch. This series supersedes
the previous submission. This version makes relative ksymtab entries
dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
than trying to infer from kbuild test robot replies for which
architectures it should be blacklisted.

Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
and sets it for the main architectures that are expected to benefit the
most from this feature, i.e., 64-bit architectures or ones that use
runtime relocations.

Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
ksymtab/kcrctab sections in decompressor and EFI stub objects when
rebuilding existing C files to run in a different context.

Patches #4 - #6 implement relative references for initcalls, PCI fixups
and tracepoints, respectively, all of which produce sections with order
~1000 entries on an arm64 defconfig kernel with tracing enabled. This
means we save about 28 KB of vmlinux space for each of these patches.

[From the v7 series blurb, which included the jump_label patches as well]:

For the arm64 kernel, all patches combined reduce the memory footprint
of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has
KASLR enabled), of which ~1 MB is the size reduction of the RELA section
in .init, and the remaining 300 KB is reduction of .text/.data.

This patch (of 6):

Before updating certain subsystems to use place relative 32-bit
relocations in special sections, to save space and reduce the number of
absolute relocations that need to be processed at runtime by relocatable
kernels, introduce the Kconfig symbol and define it for some architectures
that should be able to support and benefit from it.

Link: http://lkml.kernel.org/r/20180704083651.24360-2-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: James Morris <jmorris@namei.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Cc: James Morris <james.morris@microsoft.com>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 87a4c375 31-Jul-2018 Christoph Hellwig <hch@lst.de>

kconfig: include kernel/Kconfig.preempt from init/Kconfig

Almost all architectures include it. Add a ARCH_NO_PREEMPT symbol to
disable preempt support for alpha, hexagon, non-coldfire m68k and
user mode Linux.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


# 06ec64b8 31-Jul-2018 Christoph Hellwig <hch@lst.de>

Kconfig: consolidate the "Kernel hacking" menu

Move the source of lib/Kconfig.debug and arch/$(ARCH)/Kconfig.debug to
the top-level Kconfig. For two architectures that means moving their
arch-specific symbols in that menu into a new arch Kconfig.debug file,
and for a few more creating a dummy file so that we can include it
unconditionally.

Also move the actual 'Kernel hacking' menu to lib/Kconfig.debug, where
it belongs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


# 1572497c 31-Jul-2018 Christoph Hellwig <hch@lst.de>

kconfig: include common Kconfig files from top-level Kconfig

Instead of duplicating the source statements in every architecture just
do it once in the toplevel Kconfig file.

Note that with this the inclusion of arch/$(SRCARCH/Kconfig moves out of
the top-level Kconfig into arch/Kconfig so that don't violate ordering
constraits while keeping a sensible menu structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


# 2c870e61 24-Jul-2018 Arnd Bergmann <arnd@arndb.de>

arm64: fix ACPI dependencies

Kconfig reports a warning on x86 builds after the ARM64 dependency
was added.

drivers/acpi/Kconfig:6:error: recursive dependency detected!
drivers/acpi/Kconfig:6: symbol ACPI depends on EFI

This rephrases the dependency to keep the ARM64 details out of the
shared Kconfig file, so Kconfig no longer gets confused by it.

For consistency, all three architectures that support ACPI now
select ARCH_SUPPORTS_ACPI in exactly the configuration in which
they allow it. We still need the 'default x86', as each one
wants a different default: default-y on x86, default-n on arm64,
and always-y on ia64.

Fixes: 5bcd44083a08 ("drivers: acpi: add dependency of EFI for arm64")
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 092b31aa 08-Jul-2018 Dan Williams <dan.j.williams@intel.com>

x86/asm/memcpy_mcsafe: Fix copy_to_user_mcsafe() exception handling

All copy_to_user() implementations need to be prepared to handle faults
accessing userspace. The __memcpy_mcsafe() implementation handles both
mmu-faults on the user destination and machine-check-exceptions on the
source buffer. However, the memcpy_mcsafe() wrapper may silently
fallback to memcpy() depending on build options and cpu-capabilities.

Force copy_to_user_mcsafe() to always use __memcpy_mcsafe() when
available, and otherwise disable all of the copy_to_user_mcsafe()
infrastructure when __memcpy_mcsafe() is not available, i.e.
CONFIG_X86_MCE=n.

This fixes crashes of the form:
run fstests generic/323 at 2018-07-02 12:46:23
BUG: unable to handle kernel paging request at 00007f0d50001000
RIP: 0010:__memcpy+0x12/0x20
[..]
Call Trace:
copyout_mcsafe+0x3a/0x50
_copy_to_iter_mcsafe+0xa1/0x4a0
? dax_alive+0x30/0x50
dax_iomap_actor+0x1f9/0x280
? dax_iomap_rw+0x100/0x100
iomap_apply+0xba/0x130
? dax_iomap_rw+0x100/0x100
dax_iomap_rw+0x95/0x100
? dax_iomap_rw+0x100/0x100
xfs_file_dax_read+0x7b/0x1d0 [xfs]
xfs_file_read_iter+0xa7/0xc0 [xfs]
aio_read+0x11c/0x1a0

Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Fixes: 8780356ef630 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()")
Link: http://lkml.kernel.org/r/153108277790.37979.1486841789275803399.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6415b38b 18-May-2018 Jiri Slaby <jirislaby@kernel.org>

x86/stacktrace: Enable HAVE_RELIABLE_STACKTRACE for the ORC unwinder

In SUSE, we need a reliable stack unwinder for kernel live patching, but
we do not want to enable frame pointers for performance reasons. So
after the previous patches to make the ORC reliable, mark ORC as a
reliable stack unwinder on x86.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/lkml/20180518064713.26440-6-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 05736e4a 29-May-2018 Thomas Gleixner <tglx@linutronix.de>

cpu/hotplug: Provide knobs to control SMT

Provide a command line and a sysfs knob to control SMT.

The command line options are:

'nosmt': Enumerate secondary threads, but do not online them

'nosmt=force': Ignore secondary threads completely during enumeration
via MP table and ACPI/MADT.

The sysfs control file has the following states (read/write):

'on': SMT is enabled. Secondary threads can be freely onlined
'off': SMT is disabled. Secondary threads, even if enumerated
cannot be onlined
'forceoff': SMT is permanentely disabled. Writes to the control
file are rejected.
'notsupported': SMT is not supported by the CPU

The command line option 'nosmt' sets the sysfs control to 'off'. This
can be changed to 'on' to reenable SMT during runtime.

The command line option 'nosmt=force' sets the sysfs control to
'forceoff'. This cannot be changed during runtime.

When SMT is 'on' and the control file is changed to 'off' then all online
secondary threads are offlined and attempts to online a secondary thread
later on are rejected.

When SMT is 'off' and the control file is changed to 'on' then secondary
threads can be onlined again. The 'off' -> 'on' transition does not
automatically online the secondary threads.

When the control file is set to 'forceoff', the behaviour is the same as
setting it to 'off', but the operation is irreversible and later writes to
the control file are rejected.

When the control status is 'notsupported' then writes to the control file
are rejected.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>


# d148eac0 14-Jun-2018 Masahiro Yamada <yamada.masahiro@socionext.com>

Kbuild: rename HAVE_CC_STACKPROTECTOR config variable

HAVE_CC_STACKPROTECTOR should be selected by architectures with stack
canary implementation. It is not about the compiler support.

For the consistency with commit 050e9baa9dc9 ("Kbuild: rename
CC_STACKPROTECTOR[_STRONG] config variables"), remove 'CC_' from the
config symbol.

I moved the 'select' lines to keep the alphabetical sorting.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8458f8c2 14-Jun-2018 Masahiro Yamada <yamada.masahiro@socionext.com>

x86: fix dependency of X86_32_LAZY_GS

Commit 2a61f4747eea ("stack-protector: test compiler capability in
Kconfig and drop AUTO mode") replaced the 'choice' with two boolean
symbols, so CC_STACKPROTECTOR_NONE no longer exists.

Prior to commit 2bc2f688fdf8 ("Makefile: move stack-protector
availability out of Kconfig"), this line was like this:

depends on X86_32 && !CC_STACKPROTECTOR

The CC_ prefix was dropped by commit 050e9baa9dc9 ("Kbuild: rename
CC_STACKPROTECTOR[_STRONG] config variables"), so the dependency now
should be:

depends on X86_32 && !STACKPROTECTOR

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2a61f474 28-May-2018 Masahiro Yamada <yamada.masahiro@socionext.com>

stack-protector: test compiler capability in Kconfig and drop AUTO mode

Move the test for -fstack-protector(-strong) option to Kconfig.

If the compiler does not support the option, the corresponding menu
is automatically hidden. If STRONG is not supported, it will fall
back to REGULAR. If REGULAR is not supported, it will be disabled.
This means, AUTO is implicitly handled by the dependency solver of
Kconfig, hence removed.

I also turned the 'choice' into only two boolean symbols. The use of
'choice' is not a good idea here, because all of all{yes,mod,no}config
would choose the first visible value, while we want allnoconfig to
disable as many features as possible.

X86 has additional shell scripts in case the compiler supports those
options, but generates broken code. I added CC_HAS_SANE_STACKPROTECTOR
to test this. I had to add -m32 to gcc-x86_32-has-stack-protector.sh
to make it work correctly.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>


# 3010a5ea 07-Jun-2018 Laurent Dufour <ldufour@linux.vnet.ibm.com>

mm: introduce ARCH_HAS_PTE_SPECIAL

Currently the PTE special supports is turned on in per architecture
header files. Most of the time, it is defined in
arch/*/include/asm/pgtable.h depending or not on some other per
architecture static definition.

This patch introduce a new configuration variable to manage this
directly in the Kconfig files. It would later replace
__HAVE_ARCH_PTE_SPECIAL.

Here notes for some architecture where the definition of
__HAVE_ARCH_PTE_SPECIAL is not obvious:

arm
__HAVE_ARCH_PTE_SPECIAL which is currently defined in
arch/arm/include/asm/pgtable-3level.h which is included by
arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set.
So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE.

powerpc
__HAVE_ARCH_PTE_SPECIAL is defined in 2 files:
- arch/powerpc/include/asm/book3s/64/pgtable.h
- arch/powerpc/include/asm/pte-common.h
The first one is included if (PPC_BOOK3S & PPC64) while the second is
included in all the other cases.
So select ARCH_HAS_PTE_SPECIAL all the time.

sparc:
__HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) &&
defined(__arch64__) which are defined through the compiler in
sparc/Makefile if !SPARC32 which I assume to be if SPARC64.
So select ARCH_HAS_PTE_SPECIAL if SPARC64

There is no functional change introduced by this patch.

Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.com
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Christophe LEROY <christophe.leroy@c-s.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 94d49eb3 18-May-2018 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME

AMD SME claims one bit from physical address to indicate whether the
page is encrypted or not. To achieve that we clear out the bit from
__PHYSICAL_MASK.

The capability to adjust __PHYSICAL_MASK is required beyond AMD SME.
For instance for upcoming Intel Multi-Key Total Memory Encryption.

Factor it out into a separate feature with own Kconfig handle.

It also helps with overhead of AMD SME. It saves more than 3k in .text
on defconfig + AMD_MEM_ENCRYPT:

add/remove: 3/2 grow/shrink: 5/110 up/down: 189/-3753 (-3564)

We would need to return to this once we have infrastructure to patch
constants in code. That's good candidate for it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-mm@kvack.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180518113028.79825-1-kirill.shutemov@linux.intel.com


# d6761b8f 02-Jun-2018 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

x86: Add support for restartable sequences

Call the rseq_handle_notify_resume() function on return to userspace if
TIF_NOTIFY_RESUME thread flag is set.

Perform fixup on the pre-signal frame when a signal is delivered on top
of a restartable sequence critical section.

Check that system calls are not invoked from within rseq critical
sections by invoking rseq_signal() from syscall_return_slowpath().
With CONFIG_DEBUG_RSEQ, such behavior results in termination of the
process with SIGSEGV.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Hunter <ahh@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: linux-api@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20180602124408.8430-7-mathieu.desnoyers@efficios.com


# 104daea1 28-May-2018 Masahiro Yamada <yamada.masahiro@socionext.com>

kconfig: reference environment variables directly and remove 'option env='

To get access to environment variables, Kconfig needs to define a
symbol using "option env=" syntax. It is tedious to add a symbol entry
for each environment variable given that we need to define much more
such as 'CC', 'AS', 'srctree' etc. to evaluate the compiler capability
in Kconfig.

Adding '$' for symbol references is grammatically inconsistent.
Looking at the code, the symbols prefixed with 'S' are expanded by:
- conf_expand_value()
This is used to expand 'arch/$ARCH/defconfig' and 'defconfig_list'
- sym_expand_string_value()
This is used to expand strings in 'source' and 'mainmenu'

All of them are fixed values independent of user configuration. So,
they can be changed into the direct expansion instead of symbols.

This change makes the code much cleaner. The bounce symbols 'SRCARCH',
'ARCH', 'SUBARCH', 'KERNELVERSION' are gone.

sym_init() hard-coding 'UNAME_RELEASE' is also gone. 'UNAME_RELEASE'
should be replaced with an environment variable.

ARCH_DEFCONFIG is a normal symbol, so it should be simply referenced
without '$' prefix.

The new syntax is addicted by Make. The variable reference needs
parentheses, like $(FOO), but you can omit them for single-letter
variables, like $F. Yet, in Makefiles, people tend to use the
parenthetical form for consistency / clarification.

At this moment, only the environment variable is supported, but I will
extend the concept of 'variable' later on.

The variables are expanded in the lexer so we can simplify the token
handling on the parser side.

For example, the following code works.

[Example code]

config MY_TOOLCHAIN_LIST
string
default "My tools: CC=$(CC), AS=$(AS), CPP=$(CPP)"

[Result]

$ make -s alldefconfig && tail -n 1 .config
CONFIG_MY_TOOLCHAIN_LIST="My tools: CC=gcc, AS=as, CPP=gcc -E"

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>


# 8780356e 03-May-2018 Dan Williams <dan.j.williams@intel.com>

x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()

Use the updated memcpy_mcsafe() implementation to define
copy_user_mcsafe() and copy_to_iter_mcsafe(). The most significant
difference from typical copy_to_iter() is that the ITER_KVEC and
ITER_BVEC iterator types can fail to complete a full transfer.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: hch@lst.de
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/152539239150.31796.9189779163576449784.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 09230cbc 24-Apr-2018 Christoph Hellwig <hch@lst.de>

swiotlb: move the SWIOTLB config symbol to lib/Kconfig

This way we have one central definition of it, and user can select it as
needed. The new option is not user visible, which is the behavior
it had in most architectures, with a few notable exceptions:

- On x86_64 and mips/loongson3 it used to be user selectable, but
defaulted to y. It now is unconditional, which seems like the right
thing for 64-bit architectures without guaranteed availablity of
IOMMUs.
- on powerpc the symbol is user selectable and defaults to n, but
many boards select it. This change assumes no working setup
required a manual selection, but if that turned out to be wrong
we'll have to add another select statement or two for the respective
boards.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 4965a687 03-Apr-2018 Christoph Hellwig <hch@lst.de>

arch: define the ARCH_DMA_ADDR_T_64BIT config symbol in lib/Kconfig

Define this symbol if the architecture either uses 64-bit pointers or the
PHYS_ADDR_T_64BIT is set. This covers 95% of the old arch magic. We only
need an additional select for Xen on ARM (why anyway?), and we now always
set ARCH_DMA_ADDR_T_64BIT on mips boards with 64-bit physical addressing
instead of only doing it when highmem is set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: James Hogan <jhogan@kernel.org>


# d4a451d5 03-Apr-2018 Christoph Hellwig <hch@lst.de>

arch: remove the ARCH_PHYS_ADDR_T_64BIT config symbol

Instead select the PHYS_ADDR_T_64BIT for 32-bit architectures that need a
64-bit phys_addr_t type directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: James Hogan <jhogan@kernel.org>


# f616ab59 08-May-2018 Christoph Hellwig <hch@lst.de>

dma-mapping: move the NEED_DMA_MAP_STATE config symbol to lib/Kconfig

This way we have one central definition of it, and user can select it as
needed. Note that we now also always select it when CONFIG_DMA_API_DEBUG
is select, which fixes some incorrect checks in a few network drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>


# 86596f0a 05-Apr-2018 Christoph Hellwig <hch@lst.de>

scatterlist: move the NEED_SG_DMA_LENGTH config symbol to lib/Kconfig

This way we have one central definition of it, and user can select it as
needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>


# a4ce5a48 03-Apr-2018 Christoph Hellwig <hch@lst.de>

iommu-helper: move the IOMMU_HELPER config symbol to lib/

This way we have one central definition of it, and user can select it as
needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>


# 79c1879e 03-Apr-2018 Christoph Hellwig <hch@lst.de>

iommu-helper: mark iommu_is_span_boundary as inline

This avoids selecting IOMMU_HELPER just for this function. And we only
use it once or twice in normal builds so this often even is a size
reduction.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 6e88628d 08-May-2018 Christoph Hellwig <hch@lst.de>

dma-debug: remove CONFIG_HAVE_DMA_API_DEBUG

There is no arch specific code required for dma-debug, so there is no
need to opt into the support either.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>


# 03f5781b 03-May-2018 Wang YanQing <udknight@gmail.com>

bpf, x86_32: add eBPF JIT compiler for ia32

The JIT compiler emits ia32 bit instructions. Currently, It supports eBPF
only. Classic BPF is supported because of the conversion by BPF core.

Almost all instructions from eBPF ISA supported except the following:
BPF_ALU64 | BPF_DIV | BPF_K
BPF_ALU64 | BPF_DIV | BPF_X
BPF_ALU64 | BPF_MOD | BPF_K
BPF_ALU64 | BPF_MOD | BPF_X
BPF_STX | BPF_XADD | BPF_W
BPF_STX | BPF_XADD | BPF_DW

It doesn't support BPF_JMP|BPF_CALL with BPF_PSEUDO_CALL at the moment.

IA32 has few general purpose registers, EAX|EDX|ECX|EBX|ESI|EDI. I use
EAX|EDX|ECX|EBX as temporary registers to simulate instructions in eBPF
ISA, and allocate ESI|EDI to BPF_REG_AX for constant blinding, all others
eBPF registers, R0-R10, are simulated through scratch space on stack.

The reasons behind the hardware registers allocation policy are:
1:MUL need EAX:EDX, shift operation need ECX, so they aren't fit
for general eBPF 64bit register simulation.
2:We need at least 4 registers to simulate most eBPF ISA operations
on registers operands instead of on register&memory operands.
3:We need to put BPF_REG_AX on hardware registers, or constant blinding
will degrade jit performance heavily.

Tested on PC (Intel(R) Core(TM) i5-5200U CPU).
Testing results on i5-5200U:
1) test_bpf: Summary: 349 PASSED, 0 FAILED, [319/341 JIT'ed]
2) test_progs: Summary: 83 PASSED, 0 FAILED.
3) test_lpm: OK
4) test_lru_map: OK
5) test_verifier: Summary: 828 PASSED, 0 FAILED.

Above tests are all done in following two conditions separately:
1:bpf_jit_enable=1 and bpf_jit_harden=0
2:bpf_jit_enable=1 and bpf_jit_harden=2

Below are some numbers for this jit implementation:
Note:
I run test_progs in kselftest 100 times continuously for every condition,
the numbers are in format: total/times=avg.
The numbers that test_bpf reports show almost the same relation.

a:jit_enable=0 and jit_harden=0 b:jit_enable=1 and jit_harden=0
test_pkt_access:PASS:ipv4:15622/100=156 test_pkt_access:PASS:ipv4:10674/100=106
test_pkt_access:PASS:ipv6:9130/100=91 test_pkt_access:PASS:ipv6:4855/100=48
test_xdp:PASS:ipv4:240198/100=2401 test_xdp:PASS:ipv4:138912/100=1389
test_xdp:PASS:ipv6:137326/100=1373 test_xdp:PASS:ipv6:68542/100=685
test_l4lb:PASS:ipv4:61100/100=611 test_l4lb:PASS:ipv4:37302/100=373
test_l4lb:PASS:ipv6:101000/100=1010 test_l4lb:PASS:ipv6:55030/100=550

c:jit_enable=1 and jit_harden=2
test_pkt_access:PASS:ipv4:10558/100=105
test_pkt_access:PASS:ipv6:5092/100=50
test_xdp:PASS:ipv4:131902/100=1319
test_xdp:PASS:ipv6:77932/100=779
test_l4lb:PASS:ipv4:38924/100=389
test_l4lb:PASS:ipv6:57520/100=575

The numbers show we get 30%~50% improvement.

See Documentation/networking/filter.txt for more information.

Changelog:

Changes v5-v6:
1:Add do {} while (0) to RETPOLINE_RAX_BPF_JIT for
consistence reason.
2:Clean up non-standard comments, reported by Daniel Borkmann.
3:Fix a memory leak issue, repoted by Daniel Borkmann.

Changes v4-v5:
1:Delete is_on_stack, BPF_REG_AX is the only one
on real hardware registers, so just check with
it.
2:Apply commit 1612a981b766 ("bpf, x64: fix JIT emission
for dead code"), suggested by Daniel Borkmann.

Changes v3-v4:
1:Fix changelog in commit.
I install llvm-6.0, then test_progs willn't report errors.
I submit another patch:
"bpf: fix misaligned access for BPF_PROG_TYPE_PERF_EVENT program type on x86_32 platform"
to fix another problem, after that patch, test_verifier willn't report errors too.
2:Fix clear r0[1] twice unnecessarily in *BPF_IND|BPF_ABS* simulation.

Changes v2-v3:
1:Move BPF_REG_AX to real hardware registers for performance reason.
3:Using bpf_load_pointer instead of bpf_jit32.S, suggested by Daniel Borkmann.
4:Delete partial codes in 1c2a088a6626, suggested by Daniel Borkmann.
5:Some bug fixes and comments improvement.

Changes v1-v2:
1:Fix bug in emit_ia32_neg64.
2:Fix bug in emit_ia32_arsh_r64.
3:Delete filename in top level comment, suggested by Thomas Gleixner.
4:Delete unnecessary boiler plate text, suggested by Thomas Gleixner.
5:Rewrite some words in changelog.
6:CodingSytle improvement and a little more comments.

Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 316d097c 20-Apr-2018 Dave Hansen <dave.hansen@linux.intel.com>

x86/pti: Filter at vma->vm_page_prot population

commit ce9962bf7e22bb3891655c349faff618922d4a73

0day reported warnings at boot on 32-bit systems without NX support:

attempted to set unsupported pgprot: 8000000000000025 bits: 8000000000000000 supported: 7fffffffffffffff
WARNING: CPU: 0 PID: 1 at
arch/x86/include/asm/pgtable.h:540 handle_mm_fault+0xfc1/0xfe0:
check_pgprot at arch/x86/include/asm/pgtable.h:535
(inlined by) pfn_pte at arch/x86/include/asm/pgtable.h:549
(inlined by) do_anonymous_page at mm/memory.c:3169
(inlined by) handle_pte_fault at mm/memory.c:3961
(inlined by) __handle_mm_fault at mm/memory.c:4087
(inlined by) handle_mm_fault at mm/memory.c:4124

The problem is that due to the recent commit which removed auto-massaging
of page protections, filtering page permissions at PTE creation time is not
longer done, so vma->vm_page_prot is passed unfiltered to PTE creation.

Filter the page protections before they are installed in vma->vm_page_prot.

Fixes: fb43d6cb91 ("x86/mm: Do not auto-massage page protections")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Link: https://lkml.kernel.org/r/20180420222028.99D72858@viggo.jf.intel.com


# b799a09f 13-Apr-2018 AKASHI Takahiro <takahiro.akashi@linaro.org>

kexec_file: make use of purgatory optional

Patch series "kexec_file, x86, powerpc: refactoring for other
architecutres", v2.

This is a preparatory patchset for adding kexec_file support on arm64.

It was originally included in a arm64 patch set[1], but Philipp is also
working on their kexec_file support on s390[2] and some changes are now
conflicting.

So these common parts were extracted and put into a separate patch set
for better integration. What's more, my original patch#4 was split into
a few small chunks for easier review after Dave's comment.

As such, the resulting code is basically identical with my original, and
the only *visible* differences are:

- renaming of _kexec_kernel_image_probe() and _kimage_file_post_load_cleanup()

- change one of types of arguments at prepare_elf64_headers()

Those, unfortunately, require a couple of trivial changes on the rest
(#1, #6 to #13) of my arm64 kexec_file patch set[1].

Patch #1 allows making a use of purgatory optional, particularly useful
for arm64.

Patch #2 commonalizes arch_kexec_kernel_{image_probe, image_load,
verify_sig}() and arch_kimage_file_post_load_cleanup() across
architectures.

Patches #3-#7 are also intended to generalize parse_elf64_headers(),
along with exclude_mem_range(), to be made best re-use of.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/561182.html
[2] http://lkml.iu.edu//hypermail/linux/kernel/1802.1/02596.html

This patch (of 7):

On arm64, crash dump kernel's usable memory is protected by *unmapping*
it from kernel virtual space unlike other architectures where the region
is just made read-only. It is highly unlikely that the region is
accidentally corrupted and this observation rationalizes that digest
check code can also be dropped from purgatory. The resulting code is so
simple as it doesn't require a bit ugly re-linking/relocation stuff,
i.e. arch_kexec_apply_relocations_add().

Please see:

http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545428.html

All that the purgatory does is to shuffle arguments and jump into a new
kernel, while we still need to have some space for a hash value
(purgatory_sha256_digest) which is never checked against.

As such, it doesn't make sense to have trampline code between old kernel
and new kernel on arm64.

This patch introduces a new configuration, ARCH_HAS_KEXEC_PURGATORY, and
allows related code to be compiled in only if necessary.

[takahiro.akashi@linaro.org: fix trivial screwup]
Link: http://lkml.kernel.org/r/20180309093346.GF25863@linaro.org
Link: http://lkml.kernel.org/r/20180306102303.9063-2-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 92e830f2 04-Apr-2018 Arnd Bergmann <arnd@arndb.de>

x86/olpc: Fix inconsistent MFD_CS5535 configuration

This Kconfig warning appeared after a fix to the Kconfig validation.
The GPIO_CS5535 driver depends on the MFD_CS5535 driver, but the former
is selected in places where the latter is not:

WARNING: unmet direct dependencies detected for GPIO_CS5535
Depends on [m]: GPIOLIB [=y] && (X86 [=y] || MIPS || COMPILE_TEST [=y]) && MFD_CS5535 [=m]
Selected by [y]:
- OLPC_XO1_SCI [=y] && X86_32 [=y] && OLPC [=y] && OLPC_XO1_PM [=y] && INPUT [=y]=y

The warning does seem appropriate, since the GPIO_CS5535 driver won't
work unless MFD_CS5535 is also present. However, there is no link time
dependency between the two, so this caused no problems during randconfig
testing before.

This changes the 'select GPIO_CS5535' to 'depends on GPIO_CS5535' to
avoid the issue, at the expense of making it harder to configure the
driver (one now has to select the dependencies first).

The 'select MFD_CORE' part is completely redundant, since we already
depend on MFD_CS5535 here, so I'm removing that as well.

Ideally, the private symbols exported by that cs5535 gpio driver would
just be converted to gpiolib interfaces so we could expletely avoid
this dependency.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kbuild@vger.kernel.org
Fixes: f622f8279581 ("kconfig: warn unmet direct dependency of tristate symbols selected by y")
Link: http://lkml.kernel.org/r/20180404124539.3817101-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f8781c4a 05-Apr-2018 Dominik Brodowski <linux@dominikbrodowski.net>

syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64

Removing CONFIG_SYSCALL_PTREGS from arch/x86/Kconfig and simply selecting
ARCH_HAS_SYSCALL_WRAPPER unconditionally on x86-64 allows us to simplify
several codepaths.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180405095307.3730-7-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ebeb8c82 05-Apr-2018 Dominik Brodowski <linux@dominikbrodowski.net>

syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32

Extend ARCH_HAS_SYSCALL_WRAPPER for i386 emulation and for x32 on 64-bit
x86.

For x32, all we need to do is to create an additional stub for each
compat syscall which decodes the parameters in x86-64 ordering, e.g.:

asmlinkage long __compat_sys_x32_xyzzy(struct pt_regs *regs)
{
return c_SyS_xyzzy(regs->di, regs->si, regs->dx);
}

For i386 emulation, we need to teach compat_sys_*() to take struct
pt_regs as its only argument, e.g.:

asmlinkage long __compat_sys_ia32_xyzzy(struct pt_regs *regs)
{
return c_SyS_xyzzy(regs->bx, regs->cx, regs->dx);
}

In addition, we need to create additional stubs for common syscalls
(that is, for syscalls which have the same parameters on 32-bit and
64-bit), e.g.:

asmlinkage long __sys_ia32_xyzzy(struct pt_regs *regs)
{
return c_sys_xyzzy(regs->bx, regs->cx, regs->dx);
}

This approach avoids leaking random user-provided register content down
the call chain.

This patch is based on an original proof-of-concept

| From: Linus Torvalds <torvalds@linux-foundation.org>
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

and was split up and heavily modified by me, in particular to base it on
ARCH_HAS_SYSCALL_WRAPPER.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180405095307.3730-6-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# fa697140 05-Apr-2018 Dominik Brodowski <linux@dominikbrodowski.net>

syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls

Let's make use of ARCH_HAS_SYSCALL_WRAPPER=y on pure 64-bit x86-64 systems:

Each syscall defines a stub which takes struct pt_regs as its only
argument. It decodes just those parameters it needs, e.g:

asmlinkage long sys_xyzzy(const struct pt_regs *regs)
{
return SyS_xyzzy(regs->di, regs->si, regs->dx);
}

This approach avoids leaking random user-provided register content down
the call chain.

For example, for sys_recv() which is a 4-parameter syscall, the assembly
now is (in slightly reordered fashion):

<sys_recv>:
callq <__fentry__>

/* decode regs->di, ->si, ->dx and ->r10 */
mov 0x70(%rdi),%rdi
mov 0x68(%rdi),%rsi
mov 0x60(%rdi),%rdx
mov 0x38(%rdi),%rcx

[ SyS_recv() is automatically inlined by the compiler,
as it is not [yet] used anywhere else ]
/* clear %r9 and %r8, the 5th and 6th args */
xor %r9d,%r9d
xor %r8d,%r8d

/* do the actual work */
callq __sys_recvfrom

/* cleanup and return */
cltq
retq

The only valid place in an x86-64 kernel which rightfully calls
a syscall function on its own -- vsyscall -- needs to be modified
to pass struct pt_regs onwards as well.

To keep the syscall table generation working independent of
SYSCALL_PTREGS being enabled, the stubs are named the same as the
"original" syscall stubs, i.e. sys_*().

This patch is based on an original proof-of-concept

| From: Linus Torvalds <torvalds@linux-foundation.org>
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

and was split up and heavily modified by me, in particular to base it on
ARCH_HAS_SYSCALL_WRAPPER, to limit it to 64-bit-only for the time being,
and to update the vsyscall to the new calling convention.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180405095307.3730-4-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1897a969 09-Feb-2018 Jaak Ristioja <jaak@ristioja.ee>

Documentation: Fix early-microcode.txt references after file rename

The file Documentation/x86/early-microcode.txt was renamed to
Documentation/x86/microcode.txt in 0e3258753f81, but it was still
referenced by its old name in a three places:

* Documentation/x86/00-INDEX
* arch/x86/Kconfig
* arch/x86/kernel/cpu/microcode/amd.c

This commit updates these references accordingly.

Fixes: 0e3258753f81 ("x86/microcode: Document the three loading methods")
Signed-off-by: Jaak Ristioja <jaak@ristioja.ee>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# fc5d1073 27-Mar-2018 David Rientjes <rientjes@google.com>

x86/mm/32: Remove unused node_memmap_size_bytes() & CONFIG_NEED_NODE_MEMMAP_SIZE logic

node_memmap_size_bytes() has been unused since the v3.9 kernel, so remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Fixes: f03574f2d5b2 ("x86-32, mm: Rip out x86_32 NUMA remapping code")
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803262325540.256524@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d0266046 19-Mar-2018 Peter Zijlstra <peterz@infradead.org>

x86: Remove FAST_FEATURE_TESTS

Since we want to rely on static branches to avoid speculation, remove
any possible fallback code for static_cpu_has.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: torvalds@linux-foundation.org
Link: https://lkml.kernel.org/r/20180319154717.705383007@infradead.org


# b6e05477 19-Mar-2018 Christoph Hellwig <hch@lst.de>

dma/direct: Handle the memory encryption bit in common code

Give the basic phys_to_dma() and dma_to_phys() helpers a __-prefix and add
the memory encryption mask to the non-prefixed versions. Use the
__-prefixed versions directly instead of clearing the mask again in
various places.

Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Muli Ben-Yehuda <mulix@mulix.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20180319103826.12853-13-hch@lst.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# fec777c3 19-Mar-2018 Christoph Hellwig <hch@lst.de>

x86/dma: Use DMA-direct (CONFIG_DMA_DIRECT_OPS=y)

The generic DMA-direct (CONFIG_DMA_DIRECT_OPS=y) implementation is now
functionally equivalent to the x86 nommu dma_map implementation, so
switch over to using it.

That includes switching from using x86_dma_supported in various IOMMU
drivers to use dma_direct_supported instead, which provides the same
functionality.

Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Muli Ben-Yehuda <mulix@mulix.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20180319103826.12853-4-hch@lst.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 8364e1f8 07-Mar-2018 Jan Kiszka <jan.kiszka@siemens.com>

x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI

Jailhouse does not use ACPI, but it does support MMCONFIG. Make sure the
latter can be built without having to enable ACPI as well. Primarily, its
required to make the AMD mmconf-fam10h_64 depend upon MMCONFIG and
ACPI, instead of just the former.

Saves some bytes in the Jailhouse non-root kernel.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Cc: linux-pci@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lkml.kernel.org/r/788bbd5325d1922235e9562c213057425fbc548c.1520408357.git.jan.kiszka@siemens.com


# b45c9f36 07-Mar-2018 Jan Kiszka <jan.kiszka@siemens.com>

x86: Consolidate PCI_MMCONFIG configs

Since e279b6c1d329 ("x86: start unification of arch/x86/Kconfig.*"), there
exist two PCI_MMCONFIG entries, one from the original i386 and another from
x86_64. Consolidate both entries into a single one.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Cc: linux-pci@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lkml.kernel.org/r/2a0ccd51ea6f7996e07162918228e23bdc1fbb03.1520408357.git.jan.kiszka@siemens.com


# 55027a77 07-Mar-2018 Jan Kiszka <jan.kiszka@siemens.com>

x86: Align x86_64 PCI_MMCONFIG with 32-bit variant

Allow to enable PCI_MMCONFIG when only SFI is present and make this option
default on. This will help consolidating both into one Kconfig statement.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Cc: linux-pci@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lkml.kernel.org/r/a2faf78c54f340f5549149e8b679c95950dae83d.1520408357.git.jan.kiszka@siemens.com


# 076ca272 07-Mar-2018 Andy Lutomirski <luto@kernel.org>

x86/vsyscall/64: Drop "native" vsyscalls

Since Linux v3.2, vsyscalls have been deprecated and slow. From v3.2
on, Linux had three vsyscall modes: "native", "emulate", and "none".

"emulate" is the default. All known user programs work correctly in
emulate mode, but vsyscalls turn into page faults and are emulated.
This is very slow. In "native" mode, the vsyscall page is easily
usable as an exploit gadget, but vsyscalls are a bit faster -- they
turn into normal syscalls. (This is in contrast to vDSO functions,
which can be much faster than syscalls.) In "none" mode, there are
no vsyscalls.

For all practical purposes, "native" was really just a chicken bit
in case something went wrong with the emulation. It's been over six
years, and nothing has gone wrong. Delete it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/519fee5268faea09ae550776ce969fa6e88668b0.1520449896.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 17a2a129 29-Dec-2017 William Breathitt Gray <vilhelm.gray@gmail.com>

isa: Remove ISA_BUS_API selection for ISA_BUS

ISA_BUS_API is selected by drivers themselves when necessary. The
ISA_BUS Kconfig option is now simply a mask for true ISA device drivers
and relevant configuration. For now, the ISA_BUS Kconfig option is only
available for X86, but may be added for other arch builds in the future
if the need arises.

Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>


# d5028ba8 06-Feb-2018 Peter Zijlstra <peterz@infradead.org>

objtool, retpolines: Integrate objtool with retpoline support more closely

Disable retpoline validation in objtool if your compiler sucks, and otherwise
select the validation stuff for CONFIG_RETPOLINE=y (most builds would already
have it set due to ORC).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6657fca0 14-Feb-2018 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=y

All pieces of the puzzle are in place and we can now allow to boot with
CONFIG_X86_5LEVEL=y on a machine without LA57 support.

Kernel will detect that LA57 is missing and fold p4d at runtime.

Update the documentation and the Kconfig option description to reflect the
change.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-10-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 69b8d3fc 15-Feb-2018 Matthew Whitehead <tedheadster@gmail.com>

x86/Kconfig: Exclude i586-class CPUs lacking PAE support from the HIGHMEM64G Kconfig group

i586-class machines also lack support for Physical Address Extension (PAE),
so add them to the exclusion list.

Signed-off-by: Matthew Whitehead <tedheadster@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1518713696-11360-2-git-send-email-tedheadster@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 162434e7 14-Feb-2018 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm: Make MAX_PHYSADDR_BITS and MAX_PHYSMEM_BITS dynamic

For boot-time switching between paging modes, we need to be able to
adjust size of physical address space at runtime.

As part of making physical address space size variable, we have to make
X86_5LEVEL dependent on SPARSEMEM_VMEMMAP. !SPARSEMEM_VMEMMAP
configuration doesn't build with variable MAX_PHYSMEM_BITS.

For !SPARSEMEM_VMEMMAP SECTIONS_WIDTH depends on MAX_PHYSMEM_BITS:

SECTIONS_WIDTH
SECTIONS_SHIFT
MAX_PHYSMEM_BITS

And SECTIONS_WIDTH is used on pre-processor stage, it doesn't work if it's
dyncamic. See include/linux/page-flags-layout.h.

Effect on kernel image size:

text data bss dec hex filename
8628393 4734340 1368064 14730797 e0c62d vmlinux.before
8628892 4734340 1368064 14731296 e0c820 vmlinux.after

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-8-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# eedb92ab 14-Feb-2018 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm: Make virtual memory layout dynamic for CONFIG_X86_5LEVEL=y

We need to be able to adjust virtual memory layout at runtime to be able
to switch between 4- and 5-level paging at boot-time.

KASLR already has movable __VMALLOC_BASE, __VMEMMAP_BASE and __PAGE_OFFSET.
Let's re-use it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# aec6487e 09-Feb-2018 Ingo Molnar <mingo@kernel.org>

x86/Kconfig: Further simplify the NR_CPUS config

Clean up various aspects of the x86 CONFIG_NR_CPUS configuration switches:

- Rename the three CONFIG_NR_CPUS related variables to create a common
namespace for them:

RANGE_BEGIN_CPUS => NR_CPUS_RANGE_BEGIN
RANGE_END_CPUS => NR_CPUS_RANGE_END
DEF_CONFIG_CPUS => NR_CPUS_DEFAULT

- Align them vertically, such as:

config NR_CPUS_RANGE_END
int
depends on X86_64
default 8192 if SMP && ( MAXSMP || CPUMASK_OFFSTACK)
default 512 if SMP && (!MAXSMP && !CPUMASK_OFFSTACK)
default 1 if !SMP

- Update help text, add more comments.

Test results:

# i386 allnoconfig:
CONFIG_NR_CPUS_RANGE_BEGIN=1
CONFIG_NR_CPUS_RANGE_END=1
CONFIG_NR_CPUS_DEFAULT=1
CONFIG_NR_CPUS=1

# i386 defconfig:
CONFIG_NR_CPUS_RANGE_BEGIN=2
CONFIG_NR_CPUS_RANGE_END=8
CONFIG_NR_CPUS_DEFAULT=8
CONFIG_NR_CPUS=8

# i386 allyesconfig:
CONFIG_NR_CPUS_RANGE_BEGIN=2
CONFIG_NR_CPUS_RANGE_END=64
CONFIG_NR_CPUS_DEFAULT=32
CONFIG_NR_CPUS=32

# x86_64 allnoconfig:
CONFIG_NR_CPUS_RANGE_BEGIN=1
CONFIG_NR_CPUS_RANGE_END=1
CONFIG_NR_CPUS_DEFAULT=1
CONFIG_NR_CPUS=1

# x86_64 defconfig:
CONFIG_NR_CPUS_RANGE_BEGIN=2
CONFIG_NR_CPUS_RANGE_END=512
CONFIG_NR_CPUS_DEFAULT=64
CONFIG_NR_CPUS=64

# x86_64 allyesconfig:
CONFIG_NR_CPUS_RANGE_BEGIN=8192
CONFIG_NR_CPUS_RANGE_END=8192
CONFIG_NR_CPUS_DEFAULT=8192
CONFIG_NR_CPUS=8192

Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180210113629.jcv6su3r4suuno63@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# a0d0bb4d 09-Feb-2018 Randy Dunlap <rdunlap@infradead.org>

x86/Kconfig: Simplify NR_CPUS config

Clean up and simplify the X86 NR_CPUS Kconfig symbol/option by
introducing RANGE_BEGIN_CPUS, RANGE_END_CPUS, and DEF_CONFIG_CPUS.
Then combine some default values when their conditionals can be
reduced.

Also move the X86_BIGSMP kconfig option inside an "if X86_32"/"endif"
config block and drop its explicit "depends on X86_32".

Combine the max. 8192 cases of RANGE_END_CPUS (X86_64 only).
Split RANGE_END_CPUS and DEF_CONFIG_CPUS into separate cases for
X86_32 and X86_64.

Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/0b833246-ed4b-e451-c426-c4464725be92@infradead.org
Link: lkml.kernel.org/r/CA+55aFzOd3j6ZUSkEwTdk85qtt1JywOtm3ZAb-qAvt8_hJ6D4A@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2bc2f688 06-Feb-2018 Kees Cook <keescook@chromium.org>

Makefile: move stack-protector availability out of Kconfig

Various portions of the kernel, especially per-architecture pieces,
need to know if the compiler is building with the stack protector.
This was done in the arch/Kconfig with 'select', but this doesn't
allow a way to do auto-detected compiler support. In preparation for
creating an on-if-available default, move the logic for the definition of
CONFIG_CC_STACKPROTECTOR into the Makefile.

Link: http://lkml.kernel.org/r/1510076320-69931-3-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 10bcc80e 29-Jan-2018 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

membarrier/x86: Provide core serializing command

There are two places where core serialization is needed by membarrier:

1) When returning from the membarrier IPI,
2) After scheduler updates curr to a thread with a different mm, before
going back to user-space, since the curr->mm is used by membarrier to
check whether it needs to send an IPI to that CPU.

x86-32 uses IRET as return from interrupt, and both IRET and SYSEXIT to go
back to user-space. The IRET instruction is core serializing, but not
SYSEXIT.

x86-64 uses IRET as return from interrupt, which takes care of the IPI.
However, it can return to user-space through either SYSRETL (compat
code), SYSRETQ, or IRET. Given that SYSRET{L,Q} is not core serializing,
we rely instead on write_cr3() performed by switch_mm() to provide core
serialization after changing the current mm, and deal with the special
case of kthread -> uthread (temporarily keeping current mm into
active_mm) by adding a sync_core() in that specific case.

Use the new sync_core_before_usermode() to guarantee this.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Andrew Hunter <ahh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Avi Kivity <avi@scylladb.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: David Sehr <sehr@google.com>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maged Michael <maged.michael@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/20180129202020.8515-10-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ac1ab12a 29-Jan-2018 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

lockin/x86: Implement sync_core_before_usermode()

Ensure that a core serializing instruction is issued before returning to
user-mode. x86 implements return to user-space through sysexit, sysrel,
and sysretq, which are not core serializing.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Andrew Hunter <ahh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Avi Kivity <avi@scylladb.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: David Sehr <sehr@google.com>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maged Michael <maged.michael@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/20180129202020.8515-8-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2e3ca40f 31-Jan-2018 Pavel Tatashin <pasha.tatashin@oracle.com>

mm: relax deferred struct page requirements

There is no need to have ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT, as all
the page initialization code is in common code.

Also, there is no need to depend on MEMORY_HOTPLUG, as initialization
code does not really use hotplug memory functionality. So, we can
remove this requirement as well.

This patch allows to use deferred struct page initialization on all
platforms with memblock allocator.

Tested on x86, arm64, and sparc. Also, verified that code compiles on
PPC with CONFIG_MEMORY_HOTPLUG disabled.

Link: http://lkml.kernel.org/r/20171117014601.31606-1-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390]
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c508c46e 23-Jan-2018 Benjamin Gilbert <benjamin.gilbert@coreos.com>

firmware: Fix up docs referring to FIRMWARE_IN_KERNEL

We've removed the option, so stop talking about it.

Signed-off-by: Benjamin Gilbert <benjamin.gilbert@coreos.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# f7d83c1c 16-Aug-2017 Kees Cook <keescook@chromium.org>

x86: Implement thread_struct whitelist for hardened usercopy

This whitelists the FPU register state portion of the thread_struct for
copying to userspace, instead of the default entire struct. This is needed
because FPU register state is dynamically sized, so it doesn't bypass the
hardened usercopy checks.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Rik van Riel <riel@redhat.com>


# abde587b 15-Jan-2018 Arnd Bergmann <arnd@arndb.de>

x86/jailhouse: Add PCI dependency

Building jailhouse support without PCI results in a link error:

arch/x86/kernel/jailhouse.o: In function `jailhouse_init_platform':
jailhouse.c:(.init.text+0x235): undefined reference to `pci_probe'
arch/x86/kernel/jailhouse.o: In function `jailhouse_pci_arch_init':
jailhouse.c:(.init.text+0x265): undefined reference to `pci_direct_init'
jailhouse.c:(.init.text+0x26c): undefined reference to `pcibios_last_bus'

Add the missing Kconfig dependency.

Fixes: a0c01e4bb92d ("x86/jailhouse: Initialize PCI support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Link: https://lkml.kernel.org/r/20180115155150.51407-1-arnd@arndb.de


# 87e65d05 27-Nov-2017 Jan Kiszka <jan.kiszka@siemens.com>

x86/jailhouse: Enable PMTIMER

Jailhouse exposes the PMTIMER as only reference clock to all cells. Pick
up its address from the setup data. Allow to enable the Linux support of
it by relaxing its strict dependency on ACPI.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/6d5c3fadd801eb3fba9510e2d3db14a9c404a1a0.1511770314.git.jan.kiszka@siemens.com


# 4a362601 27-Nov-2017 Jan Kiszka <jan.kiszka@siemens.com>

x86/jailhouse: Add infrastructure for running in non-root cell

The Jailhouse hypervisor is able to statically partition a multicore
system into multiple so-called cells. Linux is used as boot loader and
continues to run in the root cell after Jailhouse is enabled. Linux can
also run in non-root cells.

Jailhouse does not emulate usual x86 devices. It also provides no
complex ACPI but basic platform information that the boot loader
forwards via setup data. This adds the infrastructure to detect when
running in a non-root cell so that the platform can be configured as
required in succeeding steps.

Support is limited to x86-64 so far, primarily because no boot loader
stub exists for i386 and, thus, we wouldn't be able to test the 32-bit
path.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: jailhouse-dev@googlegroups.com
Link: https://lkml.kernel.org/r/7f823d077b38b1a70c526b40b403f85688c137d3.1511770314.git.jan.kiszka@siemens.com


# 540adea3 12-Jan-2018 Masami Hiramatsu <mhiramat@kernel.org>

error-injection: Separate error-injection from kprobe

Since error-injection framework is not limited to be used
by kprobes, nor bpf. Other kernel subsystems can use it
freely for checking safeness of error-injection, e.g.
livepatch, ftrace etc.
So this separate error-injection framework from kprobes.

Some differences has been made:

- "kprobe" word is removed from any APIs/structures.
- BPF_ALLOW_ERROR_INJECTION() is renamed to
ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
- CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
feature. It is automatically enabled if the arch supports
error injection feature for kprobe or ftrace etc.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 76b04384 11-Jan-2018 David Woodhouse <dwmw@amazon.co.uk>

x86/retpoline: Add initial retpoline support

Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide
the corresponding thunks. Provide assembler macros for invoking the thunks
in the same way that GCC does, from native and inline assembler.

This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In
some circumstances, IBRS microcode features may be used instead, and the
retpoline can be disabled.

On AMD CPUs if lfence is serialising, the retpoline can be dramatically
simplified to a simple "lfence; jmp *\reg". A future patch, after it has
been verified that lfence really is serialising in all circumstances, can
enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition
to X86_FEATURE_RETPOLINE.

Do not align the retpoline in the altinstr section, because there is no
guarantee that it stays aligned when it's copied over the oldinstr during
alternative patching.

[ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks]
[ tglx: Put actual function CALL/JMP in front of the macros, convert to
symbolic labels ]
[ dwmw2: Convert back to numeric labels, merge objtool fixes ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.uk


# ea8c64ac 10-Jan-2018 Christoph Hellwig <hch@lst.de>

dma-mapping: move swiotlb arch helpers to a new header

phys_to_dma, dma_to_phys and dma_capable are helpers published by
architecture code for use of swiotlb and xen-swiotlb only. Drivers are
not supposed to use these directly, but use the DMA API instead.

Move these to a new asm/dma-direct.h helper, included by a
linux/dma-direct.h wrapper that provides the default linear mapping
unless the architecture wants to override it.

In the MIPS case the existing dma-coherent.h is reused for now as
untangling it will take a bit of work.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Robin Murphy <robin.murphy@arm.com>


# f328299e 29-Dec-2017 Eric Biggers <ebiggers@google.com>

locking/refcounts: Remove stale comment from the ARCH_HAS_REFCOUNT Kconfig entry

ARCH_HAS_REFCOUNT is no longer marked as broken ('if BROKEN'), so remove
the stale comment regarding it being broken.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171229195303.17781-1-ebiggers3@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 61dc0f55 07-Jan-2018 Thomas Gleixner <tglx@linutronix.de>

x86/cpu: Implement CPU vulnerabilites sysfs functions

Implement the CPU vulnerabilty show functions for meltdown, spectre_v1 and
spectre_v2.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180107214913.177414879@linutronix.de


# 7bbcbd3d 20-Dec-2017 Thomas Gleixner <tglx@linutronix.de>

x86/Kconfig: Limit NR_CPUS on 32-bit to a sane amount

The recent cpu_entry_area changes fail to compile on 32-bit when BIGSMP=y
and NR_CPUS=512, because the fixmap area becomes too big.

Limit the number of CPUs with BIGSMP to 64, which is already way to big for
32-bit, but it's at least a working limitation.

We performed a quick survey of 32-bit-only machines that might be affected
by this change negatively, but found none.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2aeb0736 15-Nov-2017 Andrey Ryabinin <ryabinin.a.a@gmail.com>

x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow

[ Note, this is a Git cherry-pick of the following commit:

d17a1d97dc20: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")

... for easier x86 PTI code testing and back-porting. ]

The KASAN shadow is currently mapped using vmemmap_populate() since that
provides a semi-convenient way to map pages into init_top_pgt. However,
since that no longer zeroes the mapped pages, it is not suitable for
KASAN, which requires zeroed shadow memory.

Add kasan_populate_shadow() interface and use it instead of
vmemmap_populate(). Besides, this allows us to take advantage of
gigantic pages and use them to populate the shadow, which should save us
some memory wasted on page tables and reduce TLB pressure.

Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 9802d865 11-Dec-2017 Josef Bacik <jbacik@fb.com>

bpf: add a bpf_override_function helper

Error injection is sloppy and very ad-hoc. BPF could fill this niche
perfectly with it's kprobe functionality. We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations. Accomplish this with the bpf_override_funciton
helper. This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function. This gives us a nice
clean way to implement systematic error injection for all of our code
paths.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# f68d62a5 15-Nov-2017 Andrey Ryabinin <ryabinin.a.a@gmail.com>

x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow

[ Note, this commit is a cherry-picked version of:

d17a1d97dc20: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")

... for easier x86 entry code testing and back-porting. ]

The KASAN shadow is currently mapped using vmemmap_populate() since that
provides a semi-convenient way to map pages into init_top_pgt. However,
since that no longer zeroes the mapped pages, it is not suitable for
KASAN, which requires zeroed shadow memory.

Add kasan_populate_shadow() interface and use it instead of
vmemmap_populate(). Besides, this allows us to take advantage of
gigantic pages and use them to populate the shadow, which should save us
some memory wasted on page tables and reduce TLB pressure.

Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d17a1d97 15-Nov-2017 Andrey Ryabinin <ryabinin.a.a@gmail.com>

x86/mm/kasan: don't use vmemmap_populate() to initialize shadow

The kasan shadow is currently mapped using vmemmap_populate() since that
provides a semi-convenient way to map pages into init_top_pgt. However,
since that no longer zeroes the mapped pages, it is not suitable for
kasan, which requires zeroed shadow memory.

Add kasan_populate_shadow() interface and use it instead of
vmemmap_populate(). Besides, this allows us to take advantage of
gigantic pages and use them to populate the shadow, which should save us
some memory wasted on page tables and reduce TLB pressure.

Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4675ff05 15-Nov-2017 Levin, Alexander (Sasha Levin) <alexander.levin@verizon.com>

kmemcheck: rip it out

Fix up makefiles, remove references, and git rm kmemcheck.

Link: http://lkml.kernel.org/r/20171007030159.22241-4-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 796ebc81 13-Nov-2017 Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

x86/umip: Select X86_INTEL_UMIP by default

UMIP does cause any performance penalty to the vast majority of x86 code
that does not use the legacy instructions affected by UMIP.

Also describe UMIP more accurately and explain the behavior that can be
expected by the (few) applications that use the affected instructions.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: ricardo.neri@intel.com
Link: http://lkml.kernel.org/r/1510640985-18412-2-git-send-email-ricardo.neri-calderon@linux.intel.com
[ Spelling fixes, rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f3edacbd 11-Nov-2017 David S. Miller <davem@davemloft.net>

bpf: Revert bpf_overrid_function() helper changes.

NACK'd by x86 maintainer.

Signed-off-by: David S. Miller <davem@davemloft.net>


# dd0bb688 07-Nov-2017 Josef Bacik <jbacik@fb.com>

bpf: add a bpf_override_function helper

Error injection is sloppy and very ad-hoc. BPF could fill this niche
perfectly with it's kprobe functionality. We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations. Accomplish this with the bpf_override_funciton
helper. This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function. This gives us a nice
clean way to implement systematic error injection for all of our code
paths.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aa35f896 05-Nov-2017 Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

x86/umip: Enable User-Mode Instruction Prevention at runtime

User-Mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.

It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a user space to be protected from. Given these similarities, UMIP can be
enabled along with SMAP and SMEP.

At the moment, UMIP is disabled by default at build time. It can be enabled
at build time by selecting CONFIG_X86_INTEL_UMIP. If enabled at build time,
it can be disabled at run time by adding clearcpuid=514 to the kernel
parameters.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: ricardo.neri@intel.com
Link: http://lkml.kernel.org/r/1509935277-22138-10-git-send-email-ricardo.neri-calderon@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 12a8cc7f 29-Sep-2017 Andrey Ryabinin <ryabinin.a.a@gmail.com>

x86/kasan: Use the same shadow offset for 4- and 5-level paging

We are going to support boot-time switching between 4- and 5-level
paging. For KASAN it means we cannot have different KASAN_SHADOW_OFFSET
for different paging modes: the constant is passed to gcc to generate
code and cannot be changed at runtime.

This patch changes KASAN code to use 0xdffffc0000000000 as shadow offset
for both 4- and 5-level paging.

For 5-level paging it means that shadow memory region is not aligned to
PGD boundary anymore and we have to handle unaligned parts of the region
properly.

In addition, we have to exclude paravirt code from KASAN instrumentation
as we now use set_pgd() before KASAN is fully ready.

[kirill.shutemov@linux.intel.com: clenaup, changelog message]
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170929140821.37654-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# c201c917 17-Oct-2017 Thomas Gleixner <tglx@linutronix.de>

x86/vector/msi: Select CONFIG_GENERIC_IRQ_RESERVATION_MODE

Select CONFIG_GENERIC_IRQ_RESERVATION_MODE so PCI/MSI domains get the
MSI_FLAG_MUST_REACTIVATE flag set in pci_msi_create_irq_domain().

Remove the explicit setters of this flag in the apic/msi code as they are
not longer required.

Fixes: 4900be83602b ("x86/vector/msi: Switch to global reservation mode")
Reported-and-tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poulson <jopoulso@microsoft.com>
Cc: Mihai Costache <v-micos@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-pci@vger.kernel.org
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: devel@linuxdriverproject.org
Cc: KY Srinivasan <kys@microsoft.com>
Link: https://lkml.kernel.org/r/20171017075600.527569354@linutronix.de


# 11af8474 13-Oct-2017 Josh Poimboeuf <jpoimboe@redhat.com>

x86/unwind: Rename unwinder config options to 'CONFIG_UNWINDER_*'

Rename the unwinder config options from:

CONFIG_ORC_UNWINDER
CONFIG_FRAME_POINTER_UNWINDER
CONFIG_GUESS_UNWINDER

to:

CONFIG_UNWINDER_ORC
CONFIG_UNWINDER_FRAME_POINTER
CONFIG_UNWINDER_GUESS

... in order to give them a more logical config namespace.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/73972fc7e2762e91912c6b9584582703d6f1b8cc.1507924831.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 39208aa7 02-Sep-2017 Kees Cook <keescook@chromium.org>

locking/refcounts, x86/asm: Enable CONFIG_ARCH_HAS_REFCOUNT

With the section inlining bug fixed for the x86 refcount protection,
we can turn the config back on.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1504382986-49301-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0fa115da 13-Sep-2017 Thomas Gleixner <tglx@linutronix.de>

x86/irq/vector: Initialize matrix allocator

Initialize the matrix allocator and add the proper accounting points to the
code.

No functional change, just preparation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213155.108410660@linutronix.de


# 3072e413 08-Sep-2017 Michal Hocko <mhocko@suse.com>

mm/memory_hotplug: introduce add_pages

There are new users of memory hotplug emerging. Some of them require
different subset of arch_add_memory. There are some which only require
allocation of struct pages without mapping those pages to the kernel
address space. We currently have __add_pages for that purpose. But this
is rather lowlevel and not very suitable for the code outside of the
memory hotplug. E.g. x86_64 wants to update max_pfn which should be done
by the caller. Introduce add_pages() which should care about those
details if they are needed. Each architecture should define its
implementation and select CONFIG_ARCH_HAS_ADD_PAGES. All others use the
currently existing __add_pages.

Link: http://lkml.kernel.org/r/20170817000548.32038-7-jglisse@redhat.com
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9c670ea3 08-Sep-2017 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

mm: thp: introduce CONFIG_ARCH_ENABLE_THP_MIGRATION

Introduce CONFIG_ARCH_ENABLE_THP_MIGRATION to limit thp migration
functionality to x86_64, which should be safer at the first step.

Link: http://lkml.kernel.org/r/20170717193955.20207-5-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# df3735c5 06-Sep-2017 Rik van Riel <riel@redhat.com>

x86,mpx: make mpx depend on x86-64 to free up VMA flag

Patch series "mm,fork,security: introduce MADV_WIPEONFORK", v4.

If a child process accesses memory that was MADV_WIPEONFORK, it will get
zeroes. The address ranges are still valid, they are just empty.

If a child process accesses memory that was MADV_DONTFORK, it will get a
segmentation fault, since those address ranges are no longer valid in
the child after fork.

Since MADV_DONTFORK also seems to be used to allow very large programs
to fork in systems with strict memory overcommit restrictions, changing
the semantics of MADV_DONTFORK might break existing programs.

The use case is libraries that store or cache information, and want to
know that they need to regenerate it in the child process after fork.

Examples of this would be:
- systemd/pulseaudio API checks (fail after fork) (replacing a getpid
check, which is too slow without a PID cache)
- PKCS#11 API reinitialization check (mandated by specification)
- glibc's upcoming PRNG (reseed after fork)
- OpenSSL PRNG (reseed after fork)

The security benefits of a forking server having a re-inialized PRNG in
every child process are pretty obvious. However, due to libraries
having all kinds of internal state, and programs getting compiled with
many different versions of each library, it is unreasonable to expect
calling programs to re-initialize everything manually after fork.

A further complication is the proliferation of clone flags, programs
bypassing glibc's functions to call clone directly, and programs calling
unshare, causing the glibc pthread_atfork hook to not get called.

It would be better to have the kernel take care of this automatically.

The patchset also adds MADV_KEEPONFORK, to undo the effects of a prior
MADV_WIPEONFORK.

This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

https://man.openbsd.org/minherit.2

This patch (of 2):

MPX only seems to be available on 64 bit CPUs, starting with Skylake and
Goldmont. Move VM_MPX into the 64 bit only portion of vma->vm_flags, in
order to free up a VMA flag.

Link: http://lkml.kernel.org/r/20170811212829.29186-2-riel@redhat.com
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Colm MacCártaigh <colm@allcosts.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5deb67f7 30-Aug-2017 Robin Murphy <robin.murphy@arm.com>

libnvdimm, nd_blk: remove mmio_flush_range()

mmio_flush_range() suffers from a lack of clearly-defined semantics,
and is somewhat ambiguous to port to other architectures where the
scope of the writeback implied by "flush" and ordering might matter,
but MMIO would tend to imply non-cacheable anyway. Per the rationale
in 67a3e8fe9015 ("nd_blk: change aperture mapping from WC to WB"), the
only existing use is actually to invalidate clean cache lines for
ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
cleanup of the pmem API, that also now happens to be the exact purpose
of arch_invalidate_pmem(), which would be a far more well-defined tool
for the job.

Rather than risk potentially inconsistent implementations of
mmio_flush_range() for the sake of one callsite, streamline things by
removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
definitions up to the libnvdimm level, so they can be shared by NFIT
as well. This allows NFIT to be enabled for arm64.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 9e52fc2b 28-Aug-2017 Vitaly Kuznetsov <vkuznets@redhat.com>

x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)

There's a subtle bug in how some of the paravirt guest code handles
page table freeing on x86:

On x86 software page table walkers depend on the fact that remote TLB flush
does an IPI: walk is performed lockless but with interrupts disabled and in
case the page table is freed the freeing CPU will get blocked as remote TLB
flush is required. On other architectures which don't require an IPI to do
remote TLB flush we have an RCU-based mechanism (see
include/asm-generic/tlb.h for more details).

In virtualized environments we may want to override the ->flush_tlb_others
callback in pv_mmu_ops and use a hypercall asking the hypervisor to do a
remote TLB flush for us. This breaks the assumption about IPIs. Xen PV has
been doing this for years and the upcoming remote TLB flush for Hyper-V will
do it too.

This is not safe, as software page table walkers may step on an already
freed page.

Fix the bug by enabling the RCU-based page table freeing mechanism,
CONFIG_HAVE_RCU_TABLE_FREE=y.

Testing with kernbench and mmap/munmap microbenchmarks, and neither showed
any noticeable performance impact.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: KY Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170828082251.5562-1-vkuznets@redhat.com
[ Rewrote/fixed/clarified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7b3d61cc 29-Aug-2017 Ingo Molnar <mingo@kernel.org>

locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being

Mike Galbraith bisected a boot crash back to the following commit:

7a46ec0e2f48 ("locking/refcounts, x86/asm: Implement fast refcount overflow protection")

The crash/hang pattern is:

> Symptom is a few splats as below, with box finally hanging. Network
> comes up, but neither ssh nor console login is possible.
>
> ------------[ cut here ]------------
> WARNING: CPU: 4 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0
> ...
> __sk_destruct()
> rcu_process_callbacks()
> __do_softirq()
> irq_exit()
> smp_apic_timer_interrupt()
> apic_timer_interrupt()

We are at -rc7 already, and the code has grown some dependencies, so
instead of a plain revert disable the config temporarily, in the hope
of getting real fixes.

Reported-by: Mike Galbraith <efault@gmx.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/tip-7a46ec0e2f4850407de5e1d19a44edee6efa58ec@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ecda85e7 16-Aug-2017 Juergen Gross <jgross@suse.com>

x86/lguest: Remove lguest support

Lguest seems to be rather unused these days. It has seen only patches
ensuring it still builds the last two years and its official state is
"Odd Fixes".

Remove it in order to be able to clean up the paravirt code.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: boris.ostrovsky@oracle.com
Cc: lguest@lists.ozlabs.org
Cc: rusty@rustcorp.com.au
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170816173157.8633-3-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 92e5aae4 18-Aug-2017 Nicholas Piggin <npiggin@gmail.com>

kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdog

Commit 05a4a9527931 ("kernel/watchdog: split up config options") lost
the perf-based hardlockup detector's dependency on PERF_EVENTS, which
can result in broken builds with some powerpc configurations.

Restore the dependency. Add it in for x86 too, despite x86 always
selecting PERF_EVENTS it seems reasonable to make the dependency
explicit.

Link: http://lkml.kernel.org/r/20170810114452.6673-1-npiggin@gmail.com
Fixes: 05a4a9527931 ("kernel/watchdog: split up config options")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7edaeb68 15-Aug-2017 Thomas Gleixner <tglx@linutronix.de>

kernel/watchdog: Prevent false positives with turbo modes

The hardlockup detector on x86 uses a performance counter based on unhalted
CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
performance counter period, so the hrtimer should fire 2-3 times before the
performance counter NMI fires. The NMI code checks whether the hrtimer
fired since the last invocation. If not, it assumess a hard lockup.

The calculation of those periods is based on the nominal CPU
frequency. Turbo modes increase the CPU clock frequency and therefore
shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
nominal frequency) the perf/NMI period is shorter than the hrtimer period
which leads to false positives.

A simple fix would be to shorten the hrtimer period, but that comes with
the side effect of more frequent hrtimer and softlockup thread wakeups,
which is not desired.

Implement a low pass filter, which checks the perf/NMI period against
kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
elapsed then the event is ignored and postponed to the next perf/NMI.

That solves the problem and avoids the overhead of shorter hrtimer periods
and more frequent softlockup thread wakeups.

Fixes: 58687acba592 ("lockup_detector: Combine nmi_watchdog and softlockup detector")
Reported-and-tested-by: Kan Liang <Kan.liang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: dzickus@redhat.com
Cc: prarit@redhat.com
Cc: ak@linux.intel.com
Cc: babu.moger@oracle.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: acme@redhat.com
Cc: stable@vger.kernel.org
Cc: atomlin@redhat.com
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos


# 7a46ec0e 15-Aug-2017 Kees Cook <keescook@chromium.org>

locking/refcounts, x86/asm: Implement fast refcount overflow protection

This implements refcount_t overflow protection on x86 without a noticeable
performance impact, though without the fuller checking of REFCOUNT_FULL.

This is done by duplicating the existing atomic_t refcount implementation
but with normally a single instruction added to detect if the refcount
has gone negative (e.g. wrapped past INT_MAX or below zero). When detected,
the handler saturates the refcount_t to INT_MIN / 2. With this overflow
protection, the erroneous reference release that would follow a wrap back
to zero is blocked from happening, avoiding the class of refcount-overflow
use-after-free vulnerabilities entirely.

Only the overflow case of refcounting can be perfectly protected, since
it can be detected and stopped before the reference is freed and left to
be abused by an attacker. There isn't a way to block early decrements,
and while REFCOUNT_FULL stops increment-from-zero cases (which would
be the state _after_ an early decrement and stops potential double-free
conditions), this fast implementation does not, since it would require
the more expensive cmpxchg loops. Since the overflow case is much more
common (e.g. missing a "put" during an error path), this protection
provides real-world protection. For example, the two public refcount
overflow use-after-free exploits published in 2016 would have been
rendered unexploitable:

http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/

http://cyseclabs.com/page?n=02012016

This implementation does, however, notice an unchecked decrement to zero
(i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it
resulted in a zero). Decrements under zero are noticed (since they will
have resulted in a negative value), though this only indicates that a
use-after-free may have already happened. Such notifications are likely
avoidable by an attacker that has already exploited a use-after-free
vulnerability, but it's better to have them reported than allow such
conditions to remain universally silent.

On first overflow detection, the refcount value is reset to INT_MIN / 2
(which serves as a saturation value) and a report and stack trace are
produced. When operations detect only negative value results (such as
changing an already saturated value), saturation still happens but no
notification is performed (since the value was already saturated).

On the matter of races, since the entire range beyond INT_MAX but before
0 is negative, every operation at INT_MIN / 2 will trap, leaving no
overflow-only race condition.

As for performance, this implementation adds a single "js" instruction
to the regular execution flow of a copy of the standard atomic_t refcount
operations. (The non-"and_test" refcount_dec() function, which is uncommon
in regular refcount design patterns, has an additional "jz" instruction
to detect reaching exactly zero.) Since this is a forward jump, it is by
default the non-predicted path, which will be reinforced by dynamic branch
prediction. The result is this protection having virtually no measurable
change in performance over standard atomic_t operations. The error path,
located in .text.unlikely, saves the refcount location and then uses UD0
to fire a refcount exception handler, which resets the refcount, handles
reporting, and returns to regular execution. This keeps the changes to
.text size minimal, avoiding return jumps and open-coded calls to the
error reporting routine.

Example assembly comparison:

refcount_inc() before:

.text:
ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp)

refcount_inc() after:

.text:
ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp)
ffffffff8154614d: 0f 88 80 d5 17 00 js ffffffff816c36d3
...
.text.unlikely:
ffffffff816c36d3: 48 8d 4d f4 lea -0xc(%rbp),%rcx
ffffffff816c36d7: 0f ff (bad)

These are the cycle counts comparing a loop of refcount_inc() from 1
to INT_MAX and back down to 0 (via refcount_dec_and_test()), between
unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL
(refcount_t-full), and this overflow-protected refcount (refcount_t-fast):

2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s:
cycles protections
atomic_t 82249267387 none
refcount_t-fast 82211446892 overflow, untested dec-to-zero
refcount_t-full 144814735193 overflow, untested dec-to-zero, inc-from-zero

This code is a modified version of the x86 PAX_REFCOUNT atomic_t
overflow defense from the last public patch of PaX/grsecurity, based
on my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code. Thanks
to PaX Team for various suggestions for improvement for repurposing this
code to be a refcount-only protection.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Jann Horn <jannh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arozansk@redhat.com
Cc: axboe@kernel.dk
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170815161924.GA133115@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f01d7d51 25-Jul-2017 Vikas Shivappa <vikas.shivappa@linux.intel.com>

x86/intel_rdt: Introduce a common compile option for RDT

We currently have a CONFIG_RDT_A which is for RDT(Resource directory
technology) allocation based resctrl filesystem interface. As a
preparation to add support for RDT monitoring as well into the same
resctrl filesystem, change the config option to be CONFIG_RDT which
would include both RDT allocation and monitoring code.

No functional change.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: davidcc@google.com
Cc: reinette.chatre@intel.com
Link: http://lkml.kernel.org/r/1501017287-28083-4-git-send-email-vikas.shivappa@linux.intel.com


# 81d38719 25-Jul-2017 Josh Poimboeuf <jpoimboe@redhat.com>

x86/kconfig: Consolidate unwinders into multiple choice selection

There are three mutually exclusive unwinders. Make that more obvious by
combining them into a multiple-choice selection:

CONFIG_FRAME_POINTER_UNWINDER
CONFIG_ORC_UNWINDER
CONFIG_GUESS_UNWINDER (if CONFIG_EXPERT=y)

Frame pointers are still the default (for now).

The old CONFIG_FRAME_POINTER option is still used in some
arch-independent places, so keep it around, but make it
invisible to the user on x86 - it's now selected by
CONFIG_FRAME_POINTER_UNWINDER=y.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/20170725135424.zukjmgpz3plf5pmt@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ee9f8fce 24-Jul-2017 Josh Poimboeuf <jpoimboe@redhat.com>

x86/unwind: Add the ORC unwinder

Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y.
It plugs into the existing x86 unwinder framework.

It relies on objtool to generate the needed .orc_unwind and
.orc_unwind_ip sections.

For more details on why ORC is used instead of DWARF, see
Documentation/x86/orc-unwinder.txt - but the short version is
that it's a simplified, fundamentally more robust debugninfo
data structure, which also allows up to two orders of magnitude
faster lookups than the DWARF unwinder - which matters to
profiling workloads like perf.

Thanks to Andy Lutomirski for the performance improvement ideas:
splitting the ORC unwind table into two parallel arrays and creating a
fast lookup table to search a subset of the unwind table.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com
[ Extended the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 77ef56e4 16-Jul-2017 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y

Most of things are in place and we can enable support for 5-level paging.

The patch makes XEN_PV and XEN_PVH dependent on !X86_5LEVEL. Both are
not ready to work with 5-level paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170716225954.74185-9-kirill.shutemov@linux.intel.com
[ Minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f88a68fa 17-Jul-2017 Tom Lendacky <thomas.lendacky@amd.com>

x86/mm: Extend early_memremap() support with additional attrs

Add early_memremap() support to be able to specify encrypted and
decrypted mappings with and without write-protection. The use of
write-protection is necessary when encrypting data "in place". The
write-protect attribute is considered cacheable for loads, but not
stores. This implies that the hardware will never give the core a
dirty line with this memtype.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/479b5832c30fae3efa7932e48f81794e86397229.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7744ccdb 17-Jul-2017 Tom Lendacky <thomas.lendacky@amd.com>

x86/mm: Add Secure Memory Encryption (SME) support

Add support for Secure Memory Encryption (SME). This initial support
provides a Kconfig entry to build the SME support into the kernel and
defines the memory encryption mask that will be used in subsequent
patches to mark pages as encrypted.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/a6c34d16caaed3bc3e2d6f0987554275bd291554.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6974f0c4 12-Jul-2017 Daniel Micay <danielmicay@gmail.com>

include/linux/string.h: add the option of fortified string.h functions

This adds support for compiling with a rough equivalent to the glibc
_FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer
overflow checks for string.h functions when the compiler determines the
size of the source or destination buffer at compile-time. Unlike glibc,
it covers buffer reads in addition to writes.

GNU C __builtin_*_chk intrinsics are avoided because they would force a
much more complex implementation. They aren't designed to detect read
overflows and offer no real benefit when using an implementation based
on inline checks. Inline checks don't add up to much code size and
allow full use of the regular string intrinsics while avoiding the need
for a bunch of _chk functions and per-arch assembly to avoid wrapper
overhead.

This detects various overflows at compile-time in various drivers and
some non-x86 core kernel code. There will likely be issues caught in
regular use at runtime too.

Future improvements left out of initial implementation for simplicity,
as it's all quite optional and can be done incrementally:

* Some of the fortified string functions (strncpy, strcat), don't yet
place a limit on reads from the source based on __builtin_object_size of
the source buffer.

* Extending coverage to more string functions like strlcat.

* It should be possible to optionally use __builtin_object_size(x, 1) for
some functions (C strings) to detect intra-object overflows (like
glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
approach to avoid likely compatibility issues.

* The compile-time checks should be made available via a separate config
option which can be enabled by default (or always enabled) once enough
time has passed to get the issues it catches fixed.

Kees said:
"This is great to have. While it was out-of-tree code, it would have
blocked at least CVE-2016-3858 from being exploitable (improper size
argument to strlcpy()). I've sent a number of fixes for
out-of-bounds-reads that this detected upstream already"

[arnd@arndb.de: x86: fix fortified memcpy]
Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de
[keescook@chromium.org: avoid panic() in favor of BUG()]
Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast
[keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help]
Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com
Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.org
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 05a4a952 12-Jul-2017 Nicholas Piggin <npiggin@gmail.com>

kernel/watchdog: split up config options

Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.

LOCKUP_DETECTOR implies the general boot, sysctl, and programming
interfaces for the lockup detectors.

An architecture that wants to use a hard lockup detector must define
HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.

Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides the
minimum arch_touch_nmi_watchdog, and it otherwise does its own thing and
does not implement the LOCKUP_DETECTOR interfaces.

sparc is unusual in that it has started to implement some of the
interfaces, but not fully yet. It should probably be converted to a full
HAVE_HARDLOCKUP_DETECTOR_ARCH.

[npiggin@gmail.com: fix]
Link: http://lkml.kernel.org/r/20170617223522.66c0ad88@roar.ozlabs.ibm.com
Link: http://lkml.kernel.org/r/20170616065715.18390-4-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Tested-by: Babu Moger <babu.moger@oracle.com> [sparc]
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e1073d1e 06-Jul-2017 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

mm/hugetlb: clean up ARCH_HAS_GIGANTIC_PAGE

This moves the #ifdef in C code to a Kconfig dependency. Also we move
the gigantic_page_supported() function to be arch specific.

This allows architectures to conditionally enable runtime allocation of
gigantic huge page. Architectures like ppc64 supports different
gigantic huge page size (16G and 1G) based on the translation mode
selected. This provides an opportunity for ppc64 to enable runtime
allocation only w.r.t 1G hugepage.

No functional change in this patch.

Link: http://lkml.kernel.org/r/1494995292-4443-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 38d8b4e6 06-Jul-2017 Huang Ying <ying.huang@intel.com>

mm, THP, swap: delay splitting THP during swap out

Patch series "THP swap: Delay splitting THP during swapping out", v11.

This patchset is to optimize the performance of Transparent Huge Page
(THP) swap.

Recently, the performance of the storage devices improved so fast that
we cannot saturate the disk bandwidth with single logical CPU when do
page swap out even on a high-end server machine. Because the
performance of the storage device improved faster than that of single
logical CPU. And it seems that the trend will not change in the near
future. On the other hand, the THP becomes more and more popular
because of increased memory size. So it becomes necessary to optimize
THP swap performance.

The advantages of the THP swap support include:

- Batch the swap operations for the THP to reduce lock
acquiring/releasing, including allocating/freeing the swap space,
adding/deleting to/from the swap cache, and writing/reading the swap
space, etc. This will help improve the performance of the THP swap.

- The THP swap space read/write will be 2M sequential IO. It is
particularly helpful for the swap read, which are usually 4k random
IO. This will improve the performance of the THP swap too.

- It will help the memory fragmentation, especially when the THP is
heavily used by the applications. The 2M continuous pages will be
free up after THP swapping out.

- It will improve the THP utilization on the system with the swap
turned on. Because the speed for khugepaged to collapse the normal
pages into the THP is quite slow. After the THP is split during the
swapping out, it will take quite long time for the normal pages to
collapse back into the THP after being swapped in. The high THP
utilization helps the efficiency of the page based memory management
too.

There are some concerns regarding THP swap in, mainly because possible
enlarged read/write IO size (for swap in/out) may put more overhead on
the storage device. To deal with that, the THP swap in should be turned
on only when necessary. For example, it can be selected via
"always/never/madvise" logic, to be turned on globally, turned off
globally, or turned on only for VMA with MADV_HUGEPAGE, etc.

This patchset is the first step for the THP swap support. The plan is
to delay splitting THP step by step, finally avoid splitting THP during
the THP swapping out and swap out/in the THP as a whole.

As the first step, in this patchset, the splitting huge page is delayed
from almost the first step of swapping out to after allocating the swap
space for the THP and adding the THP into the swap cache. This will
reduce lock acquiring/releasing for the locks used for the swap cache
management.

With the patchset, the swap out throughput improves 15.5% (from about
3.73GB/s to about 4.31GB/s) in the vm-scalability swap-w-seq test case
with 8 processes. The test is done on a Xeon E5 v3 system. The swap
device used is a RAM simulated PMEM (persistent memory) device. To test
the sequential swapping out, the test case creates 8 processes, which
sequentially allocate and write to the anonymous pages until the RAM and
part of the swap device is used up.

This patch (of 5):

In this patch, splitting huge page is delayed from almost the first step
of swapping out to after allocating the swap space for the THP
(Transparent Huge Page) and adding the THP into the swap cache. This
will batch the corresponding operation, thus improve THP swap out
throughput.

This is the first step for the THP swap optimization. The plan is to
delay splitting the THP step by step and avoid splitting the THP
finally.

In this patch, one swap cluster is used to hold the contents of each THP
swapped out. So, the size of the swap cluster is changed to that of the
THP (Transparent Huge Page) on x86_64 architecture (512). For other
architectures which want such THP swap optimization,
ARCH_USES_THP_SWAP_CLUSTER needs to be selected in the Kconfig file for
the architecture. In effect, this will enlarge swap cluster size by 2
times on x86_64. Which may make it harder to find a free cluster when
the swap space becomes fragmented. So that, this may reduce the
continuous swap space allocation and sequential write in theory. The
performance test in 0day shows no regressions caused by this.

In the future of THP swap optimization, some information of the swapped
out THP (such as compound map count) will be recorded in the
swap_cluster_info data structure.

The mem cgroup swap accounting functions are enhanced to support charge
or uncharge a swap cluster backing a THP as a whole.

The swap cluster allocate/free functions are added to allocate/free a
swap cluster for a THP. A fair simple algorithm is used for swap
cluster allocation, that is, only the first swap device in priority list
will be tried to allocate the swap cluster. The function will fail if
the trying is not successful, and the caller will fallback to allocate a
single swap slot instead. This works good enough for normal cases. If
the difference of the number of the free swap clusters among multiple
swap devices is significant, it is possible that some THPs are split
earlier than necessary. For example, this could be caused by big size
difference among multiple swap devices.

The swap cache functions is enhanced to support add/delete THP to/from
the swap cache as a set of (HPAGE_PMD_NR) sub-pages. This may be
enhanced in the future with multi-order radix tree. But because we will
split the THP soon during swapping out, that optimization doesn't make
much sense for this first step.

The THP splitting functions are enhanced to support to split THP in swap
cache during swapping out. The page lock will be held during allocating
the swap cluster, adding the THP into the swap cache and splitting the
THP. So in the code path other than swapping out, if the THP need to be
split, the PageSwapCache(THP) will be always false.

The swap cluster is only available for SSD, so the THP swap optimization
in this patchset has no effect for HDD.

[ying.huang@intel.com: fix two issues in THP optimize patch]
Link: http://lkml.kernel.org/r/87k25ed8zo.fsf@yhuang-dev.intel.com
[hannes@cmpxchg.org: extensive cleanups and simplifications, reduce code size]
Link: http://lkml.kernel.org/r/20170515112522.32457-2-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Andrew Morton <akpm@linux-foundation.org> [for config option]
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [for changes in huge_memory.c and huge_mm.h]
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 65f7d049 27-Jun-2017 Oliver O'Halloran <oohall@gmail.com>

mm, x86: Add ARCH_HAS_ZONE_DEVICE to Kconfig

Currently ZONE_DEVICE depends on X86_64 and this will get unwieldly as
new architectures (and platforms) get ZONE_DEVICE support. Move to an
arch selected Kconfig option to save us the trouble.

Cc: linux-mm@kvack.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# df65c1bc 16-Mar-2017 Thomas Gleixner <tglx@linutronix.de>

x86/PCI: Select CONFIG_PCI_LOCKLESS_CONFIG

All x86 PCI configuration space accessors have either their own
serialization or can operate completely lockless (ECAM).

Disable the global lock in the generic PCI configuration space accessors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <helgaas@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/20170316215057.295079391@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# c7d6c9dd 19-Jun-2017 Thomas Gleixner <tglx@linutronix.de>

x86/apic: Implement effective irq mask update

Add the effective irq mask update to the apic implementations and enable
effective irq masks for x86.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.878370703@linutronix.de


# ad7a929f 19-Jun-2017 Thomas Gleixner <tglx@linutronix.de>

x86/irq: Use irq_migrate_all_off_this_cpu()

The generic migration code supports all the required features
already. Remove the x86 specific implementation and use the generic one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235445.851311033@linutronix.de


# bc8e80d5 13-Jun-2017 Borislav Petkov <bp@suse.de>

x86/mce: Merge mce_amd_inj into mce-inject

Reuse mce_amd_inj's debugfs interface so that mce-inject can
benefit from it too. The old functionality is still preserved under
CONFIG_X86_MCELOG_LEGACY.

Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20170613162835.30750-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# e585513b 06-Jun-2017 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation

This patch provides all required callbacks required by the generic
get_user_pages_fast() code and switches x86 over - and removes
the platform specific implementation.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0aed55af 29-May-2017 Dan Williams <dan.j.williams@intel.com>

x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations

The pmem driver has a need to transfer data with a persistent memory
destination and be able to rely on the fact that the destination writes are not
cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
(non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
to ensure data-writes have reached a power-fail-safe zone in the platform. The
fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
around and fence previous writes with an "sfence".

Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
memcpy_flushcache, that guarantee that the destination buffer is not dirty in
the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
will be used to replace the "pmem api" (include/linux/pmem.h +
arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
otherwise.

This is meant to satisfy the concern from Linus that if a driver wants to do
something beyond the normal nocache semantics it should be something private to
that driver [1], and Al's concern that anything uaccess related belongs with
the rest of the uaccess code [2].

The first consumer of this interface is a new 'copy_from_iter' dax operation so
that pmem can inject cache maintenance operations without imposing this
overhead on other dax-capable drivers.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html

Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 47b2c3ff 08-Jun-2017 Bilal Amarni <bilal.amarni@gmail.com>

security/keys: add CONFIG_KEYS_COMPAT to Kconfig

CONFIG_KEYS_COMPAT is defined in arch-specific Kconfigs and is missing for
several 64-bit architectures : mips, parisc, tile.

At the moment and for those architectures, calling in 32-bit userspace the
keyctl syscall would return an ENOSYS error.

This patch moves the CONFIG_KEYS_COMPAT option to security/keys/Kconfig, to
make sure the compatibility wrapper is registered by default for any 64-bit
architecture as long as it is configured with CONFIG_COMPAT.

[DH: Modified to remove arm64 compat enablement also as requested by Eric
Biggers]

Signed-off-by: Bilal Amarni <bilal.amarni@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
cc: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>


# ce4a4e56 28-May-2017 Andy Lutomirski <luto@kernel.org>

x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code

The UP asm/tlbflush.h generates somewhat nicer code than the SMP version.
Aside from that, it's fallen quite a bit behind the SMP code:

- flush_tlb_mm_range() didn't flush individual pages if the range
was small.

- The lazy TLB code was much weaker. This usually wouldn't matter,
but, if a kernel thread flushed its lazy "active_mm" more than
once (due to reclaim or similar), it wouldn't be unlazied and
would instead pointlessly flush repeatedly.

- Tracepoints were missing.

Aside from that, simply having the UP code around was a maintanence
burden, since it means that any change to the TLB flush code had to
make sure not to break it.

Simplify everything by deleting the UP code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# c9525a3f 20-May-2017 Benjamin Peterson <bp@benjamin.pe>

x86/watchdog: Fix Kconfig help text file path reference to lockup watchdog documentation

Signed-off-by: Benjamin Peterson <bp@benjamin.pe>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 9919cba7ff71147803c988521cc1ceb80e7f0f6d ("watchdog: Update documentation")
Link: http://lkml.kernel.org/r/20170521002016.13258-1-bp@benjamin.pe
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2fefc97b 05-Apr-2017 Al Viro <viro@zeniv.linux.org.uk>

HAVE_ARCH_HARDENED_USERCOPY is unconditional now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 701cac61 05-Apr-2017 Al Viro <viro@zeniv.linux.org.uk>

CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now

all architectures converted

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 6dd29b3d 23-Apr-2017 Ingo Molnar <mingo@kernel.org>

Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation"

This reverts commit 2947ba054a4dabbd82848728d765346886050029.

Dan Williams reported dax-pmem kernel warnings with the following signature:

WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic

... and bisected it to this commit, which suggests possible memory corruption
caused by the x86 fast-GUP conversion.

He also pointed out:

"
This is similar to the backtrace when we were not properly handling
pud faults and was fixed with this commit: 220ced1676c4 "mm: fix
get_user_pages() vs device-dax pud mappings"

I've found some missing _devmap checks in the generic
get_user_pages_fast() path, but this does not fix the regression
[...]
"

So given that there are known bugs, and a pretty robust looking bisection
points to this commit suggesting that are unknown bugs in the conversion
as well, revert it for the time being - we'll re-try in v4.13.

Reported-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: dann.frazier@canonical.com
Cc: dave.hansen@intel.com
Cc: steve.capper@linaro.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6807c846 18-Apr-2017 Ingo Molnar <mingo@kernel.org>

x86: Enable KASLR by default

KASLR is mature (and important) enough to be enabled by default on x86.

Also enable it by default in the defconfigs.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: dan.j.williams@intel.com
Cc: dave.jiang@intel.com
Cc: dyoung@redhat.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 4c7c4483 30-Mar-2017 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm: Define virtual memory map for 5-level paging

The first part of memory map (up to %esp fixup) simply scales existing
map for 4-level paging by factor of 9 -- number of bits addressed by
the additional page table level.

The rest of the map is unchanged.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170330080731.65421-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# beba3a20 25-Mar-2017 Al Viro <viro@zeniv.linux.org.uk>

x86: switch to RAW_COPY_USER

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 5de97c9f 27-Mar-2017 Tony Luck <tony.luck@intel.com>

x86/mce: Factor out and deprecate the /dev/mcelog driver

Move all code relating to /dev/mcelog to a separate source file.
/dev/mcelog driver can now operate from the machine check notifier with
lowest prio.

Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Move the mce_helper and trigger functionality behind CONFIG_X86_MCELOG_LEGACY. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-6-bp@alien8.de
[ Renamed CONFIG_X86_MCELOG to CONFIG_X86_MCELOG_LEGACY. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 644e0e8d 23-Mar-2017 Steven Rostedt (VMware) <rostedt@goodmis.org>

x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set

x86_64 has had fentry support for some time. I did not add support to x86_32
as I was unsure if it will be used much in the future. It is still very much
used, and there's issues with function graph tracing with gcc playing around
with the mcount frames, causing function graph to panic. The fentry code
does not have this issue, and is able to cope as there is no frame to mess
up.

Note, this only adds support for fentry when DYNAMIC_FTRACE is set. There's
really no reason to not have that set, because the performance of the
machine drops significantly when it's not enabled.

Keep !DYNAMIC_FTRACE around to test it off, as there's still some archs
that have FTRACE but not DYNAMIC_FTRACE.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170323143446.052202377@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 2947ba05 16-Mar-2017 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation

This patch provides all required callbacks required by the generic
get_user_pages_fast() code and switches x86 over - and removes
the platform specific implementation.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dann Frazier <dann.frazier@canonical.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170316213906.89528-1-kirill.shutemov@linux.intel.com
[ Minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1b028f78 06-Mar-2017 Dmitry Safonov <0x7f454c46@gmail.com>

x86/mm: Introduce mmap_compat_base() for 32-bit mmap()

mmap() uses a base address, from which it starts to look for a free space
for allocation.

The base address is stored in mm->mmap_base, which is calculated during
exec(). The address depends on task's size, set rlimit for stack, ASLR
randomization. The base depends on the task size and the number of random
bits which are different for 64-bit and 32bit applications.

Due to the fact, that the base address is fixed, its mmap() from a compat
(32bit) syscall issued by a 64bit task will return a address which is based
on the 64bit base address and does not fit into the 32bit address space
(4GB). The returned pointer is truncated to 32bit, which results in an
invalid address.

To solve store a seperate compat address base plus a compat legacy address
base in mm_struct. These bases are calculated at exec() time and can be
used later to address the 32bit compat mmap() issued by 64 bit
applications.

As a consequence of this change 32-bit applications issuing a 64-bit
syscall (after doing a long jump) will get a 64-bit mapping now. Before
this change 32-bit applications always got a 32bit mapping.

[ tglx: Massaged changelog and added a comment ]

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: 0x7f454c46@gmail.com
Cc: linux-mm@kvack.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20170306141721.9188-4-dsafonov@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# af085d90 13-Feb-2017 Josh Poimboeuf <jpoimboe@redhat.com>

stacktrace/x86: add function for detecting reliable stack traces

For live patching and possibly other use cases, a stack trace is only
useful if it can be assured that it's completely reliable. Add a new
save_stack_trace_tsk_reliable() function to achieve that.

Note that if the target task isn't the current task, and the target task
is allowed to run, then it could be writing the stack while the unwinder
is reading it, resulting in possible corruption. So the caller of
save_stack_trace_tsk_reliable() must ensure that the task is either
'current' or inactive.

save_stack_trace_tsk_reliable() relies on the x86 unwinder's detection
of pt_regs on the stack. If the pt_regs are not user-mode registers
from a syscall, then they indicate an in-kernel interrupt or exception
(e.g. preemption or a page fault), in which case the stack is considered
unreliable due to the nature of frame pointers.

It also relies on the x86 unwinder's detection of other issues, such as:

- corrupted stack data
- stack grows the wrong way
- stack walk doesn't reach the bottom
- user didn't provide a large enough entries array

Such issues are reported by checking unwind_error() and !unwind_done().

Also add CONFIG_HAVE_RELIABLE_STACKTRACE so arch-independent code can
determine at build time whether the function is implemented.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Ingo Molnar <mingo@kernel.org> # for the x86 changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# a00cc7d9 24-Feb-2017 Matthew Wilcox <willy@infradead.org>

mm, x86: add support for PUD-sized transparent hugepages

The current transparent hugepage code only supports PMDs. This patch
adds support for transparent use of PUDs with DAX. It does not include
support for anonymous pages. x86 support code also added.

Most of this patch simply parallels the work that was done for huge
PMDs. The only major difference is how the new ->pud_entry method in
mm_walk works. The ->pmd_entry method replaces the ->pte_entry method,
whereas the ->pud_entry method works along with either ->pmd_entry or
->pte_entry. The pagewalk code takes care of locking the PUD before
calling ->pud_walk, so handlers do not need to worry whether the PUD is
stable.

[dave.jiang@intel.com: fix SMP x86 32bit build for native_pud_clear()]
Link: http://lkml.kernel.org/r/148719066814.31111.3239231168815337012.stgit@djiang5-desk3.ch.intel.com
[dave.jiang@intel.com: native_pud_clear missing on i386 build]
Link: http://lkml.kernel.org/r/148640375195.69754.3315433724330910314.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545059381.17912.8602162635537598445.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Alexander Kapshuk <alexander.kapshuk@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d2852a22 21-Feb-2017 Daniel Borkmann <daniel@iogearbox.net>

arch: add ARCH_HAS_SET_MEMORY config

Currently, there's no good way to test for the presence of
set_memory_ro/rw/x/nx() helpers implemented by archs such as
x86, arm, arm64 and s390.

There's DEBUG_SET_MODULE_RONX and DEBUG_RODATA, however both
don't really reflect that: set_memory_*() are also available
even when DEBUG_SET_MODULE_RONX is turned off, and DEBUG_RODATA
is set by parisc, but doesn't implement above functions. Thus,
add ARCH_HAS_SET_MEMORY that is selected by mentioned archs,
where generic code can test against this.

This also allows later on to move DEBUG_SET_MODULE_RONX out of
the arch specific Kconfig to define it only once depending on
ARCH_HAS_SET_MEMORY.

Suggested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ad21fc4f 06-Feb-2017 Laura Abbott <labbott@redhat.com>

arch: Move CONFIG_DEBUG_RODATA and CONFIG_SET_MODULE_RONX to be common

There are multiple architectures that support CONFIG_DEBUG_RODATA and
CONFIG_SET_MODULE_RONX. These options also now have the ability to be
turned off at runtime. Move these to an architecture independent
location and make these options def_bool y for almost all of those
arches.

Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>


# 5773ebfe 06-Feb-2017 Niklas Cassel <niklas.cassel@axis.com>

x86/kconfig: Remove misleading note regarding hibernation and KASLR

There used to be a restriction with KASLR and hibernation, but this is no
longer true, and since commit:

65fe935dd238 ("x86/KASLR, x86/power: Remove x86 hibernation restrictions")

the parameter "kaslr" does no longer exist.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Niklas Cassel <niklass@axis.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1486399429-23078-1-git-send-email-niklass@axis.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 80a7581f 22-Jan-2017 Irina Tirdea <irina.tirdea@intel.com>

arch/x86/platform/atom: Move pmc_atom to drivers/platform/x86

The pmc_atom driver does not contain any architecture specific
code. It only enables the SoC Power Management Controller driver
for BayTrail and CherryTrail platforms.

Move the pmc_atom driver from arch/x86/platform/atom to
drivers/platform/x86. Also clean-up and reorder include files by
alphabetical order in pmc_atom.h

Signed-off-by: Irina Tirdea <irina.tirdea@intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>


# d4b2ac63 23-Jan-2017 Borislav Petkov <bp@suse.de>

x86/ras/inject: Make it depend on X86_LOCAL_APIC=y

... and get rid of the annoying:

arch/x86/kernel/cpu/mcheck/mce-inject.c:97:13: warning: ‘mce_irq_ipi’ defined but not used [-Wunused-function]

when doing randconfig builds.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# fa5b6ec9 10-Jan-2017 Laura Abbott <labbott@redhat.com>

lib/Kconfig.debug: Add ARCH_HAS_DEBUG_VIRTUAL

DEBUG_VIRTUAL currently depends on DEBUG_KERNEL && X86. arm64 is getting
the same support. Rather than add a list of architectures, switch this
to ARCH_HAS_DEBUG_VIRTUAL and let architectures select it as
appropriate.

Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 6613d18e 31-Oct-2016 Vadim Pasternak <vadimp@mellanox.com>

platform/x86: mlx-platform: Move module from arch/x86

Since mlx-platform is not an architectural driver, it is moved out
of arch/x86/platform to drivers/platform/x86.
Relevant Makefile and Kconfig are updated.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>


# 0a21fc12 30-Nov-2016 Ingo Molnar <mingo@kernel.org>

sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable

Right now CONFIG_SCHED_MC_PRIO has X86_INTEL_PSTATE as a dependency,
which is not enabled by default and which hides the CONFIG_SCHED_MC_PRIO
hardware-enabling feature.

Select X86_INTEL_PSTATE instead, plus its dependency (CPU_FREQ), if the
user enables CONFIG_SCHED_MC_PRIO=y.

(Also align the CONFIG_SCHED_MC_PRIO Kconfig help text in standard style.)

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: bp@suse.de
Cc: jolsa@redhat.com
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: rjw@rjwysocki.net
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# de966cf4 29-Nov-2016 Tim Chen <tim.c.chen@linux.intel.com>

sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO

Rename CONFIG_SCHED_ITMT for Intel Turbo Boost Max Technology 3.0
to CONFIG_SCHED_MC_PRIO. This makes the configuration extensible
in future to other architectures that wish to similarly establish
CPU core priorities support in the scheduler.

The description in Kconfig is updated to reflect this change with
added details for better clarity. The configuration is explicitly
default-y, to enable the feature on CPUs that have this feature.

It has no effect on non-TBM3 CPUs.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Cc: jolsa@redhat.com
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/2b2ee29d93e3f162922d72d0165a1405864fbb23.1480444902.git.tim.c.chen@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 5e76b2ab 22-Nov-2016 Tim Chen <tim.c.chen@linux.intel.com>

x86: Enable Intel Turbo Boost Max Technology 3.0

On platforms supporting Intel Turbo Boost Max Technology 3.0, the maximum
turbo frequencies of some cores in a CPU package may be higher than for
the other cores in the same package. In that case, better performance
(and possibly lower energy consumption as well) can be achieved by
making the scheduler prefer to run tasks on the CPUs with higher max
turbo frequencies.

To that end, set up a core priority metric to abstract the core
preferences based on the maximum turbo frequency. In that metric,
the cores with higher maximum turbo frequencies are higher-priority
than the other cores in the same package and that causes the scheduler
to favor them when making load-balancing decisions using the asymmertic
packing approach. At the same time, the priority of SMT threads with a
higher CPU number is reduced so as to avoid scheduling tasks on all of
the threads that belong to a favored core before all of the other cores
have been given a task to run.

The priority metric will be initialized by the P-state driver with the
help of the sched_set_itmt_core_prio() function. The P-state driver
will also determine whether or not ITMT is supported by the platform
and will call sched_set_itmt_support() to indicate that.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/cd401ccdff88f88c8349314febdc25d51f7c48f7.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# f5382de9 17-Nov-2016 Yazen Ghannam <Yazen.Ghannam@amd.com>

x86/mce/AMD: Add system physical address translation for AMD Fam17h

The Unified Memory Controllers (UMCs) on Fam17h log a normalized address
in their MCA_ADDR registers. We need to convert that normalized address
to a system physical address in order to support a few facilities:

1) To offline poisoned pages in DRAM proactively in the deferred error
handler.

2) To print sysaddr and page info for DRAM ECC errors in EDAC.

[ Boris: fixes/cleanups ontop:

* hi_addr_offset = 0 - no need for that branch. Stick it all under the
HiAddrOffsetEn case. It confines hi_addr_offset's declaration too.

* Move variables to the innermost scope they're used at so that we save
on stack and not blow it up immediately on function entry.

* Do not modify *sys_addr prematurely - we want to not exit early and
have modified *sys_addr some, which callers get to see. We either
convert to a sys_addr or we don't do anything. And we signal that with
the retval of the function.

* Rename label out -> out_err - because it is the error path.

* No need to pr_err of the conversion failed case: imagine a
sparsely-populated machine with UMCs which don't have DIMMs. Callers
should look at the retval instead and issue a printk only when really
necessary. No need for useless info in dmesg.

* s/temp_reg/tmp/ and other variable names shortening => shorter code.

* Use BIT() everywhere.

* Make error messages more informative.

* Small build fix for the !CONFIG_X86_MCE_AMD case.

* ... and more minor cleanups.
]

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161122111133.mjzpvzhf7o7yl2oa@pd.tnic
[ Typo fixes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 59fe5a77 15-Nov-2016 Thomas Gleixner <tglx@linutronix.de>

x86/intel_rdt: Select KERNFS when enabling INTEL_RDT_A

arch/x86/kernel/cpu/intel_rdt_rdtgroup.c: In function 'rdtgroup_kn_lock_live':
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c:658:2: error: implicit declaration of
function 'kernfs_break_active_protection' [-Werror=implicit-function-declaration]

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>


# c763ea26 15-Nov-2016 Ingo Molnar <mingo@kernel.org>

x86/kconfig: Sort the 'config X86' selects alphabetically

This organizes the list a bit, plus reduces future conflicts (if people
remember to insert new options alphabetically that is).

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 953fee1d 15-Nov-2016 Ingo Molnar <mingo@kernel.org>

x86/kconfig: Clean up 32-bit compat options

Introduce a 'COMPAT_32' helper config value for 'X86_32 || IA32_EMULATION'
and use it.

Also move some selects to this new option, to remove more selects from
the generic X86 section.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 39f88911 15-Nov-2016 Ingo Molnar <mingo@kernel.org>

x86/kconfig: Clean up IA32_EMULATION select

Move a 'if IA32_EMULATION' select from the generic X86 section
to the IA32_EMULATION option.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 52c8e601 15-Nov-2016 Ingo Molnar <mingo@kernel.org>

x86/kconfig, x86/pkeys: Move pkeys selects to X86_INTEL_MEMORY_PROTECTION_KEYS

Move the pkeys selects from the generic x86 config section to
the X86_INTEL_MEMORY_PROTECTION_KEYS section.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d94e0685 15-Nov-2016 Ingo Molnar <mingo@kernel.org>

x86/kconfig: Move 64-bit only arch Kconfig selects to 'config X86_64'

These are easier to read when they come next to the X86_64 config.

Note that all remaining 'if X86_64' config options in the generic
section are in principle suitable for activation on 32-bit, but have
not been ported yet.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 341c787e 15-Nov-2016 Ingo Molnar <mingo@kernel.org>

x86/kconfig: Move 32-bit only arch Kconfig selects to 'config X86_32'

These are easier to read when they come next to the X86_32 config.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 78e99b4a 22-Oct-2016 Fenghua Yu <fenghua.yu@intel.com>

x86/intel_rdt: Add CONFIG, Makefile, and basic initialization

Introduce CONFIG_INTEL_RDT_A (default: no, dependent on CPU_SUP_INTEL) to
control inclusion of Resource Director Technology in the build.

Simple init() routine just checks which features are present. If they are
pr_info() one line summary for each feature for now.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 8c27ceff3 18-Oct-2016 Mauro Carvalho Chehab <mchehab@kernel.org>

docs: fix locations of several documents that got moved

The previous patch renamed several files that are cross-referenced
along the Kernel documentation. Adjust the links to point to
the right places.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>


# 51a02124 07-Oct-2016 Vineet Gupta <Vineet.Gupta1@synopsys.com>

atomic64: no need for CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE

This came to light when implementing native 64-bit atomics for ARCv2.

The atomic64 self-test code uses CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
to check whether atomic64_dec_if_positive() is available. It seems it
was needed when not every arch defined it. However as of current code
the Kconfig option seems needless

- for CONFIG_GENERIC_ATOMIC64 it is auto-enabled in lib/Kconfig and a
generic definition of API is present lib/atomic64.c
- arches with native 64-bit atomics select it in arch/*/Kconfig and
define the API in their headers

So I see no point in keeping the Kconfig option

Compile tested for:
- blackfin (CONFIG_GENERIC_ATOMIC64)
- x86 (!CONFIG_GENERIC_ATOMIC64)
- ia64

Link: http://lkml.kernel.org/r/1473703083-8625-3-git-send-email-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ming Lin <ming.l@ssi.samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 461a7184 07-Oct-2016 Yisheng Xie <xieyisheng1@huawei.com>

mm/hugetlb: introduce ARCH_HAS_GIGANTIC_PAGE

Avoid making ifdef get pretty unwieldy if many ARCHs support gigantic
page. No functional change with this patch.

Link: http://lkml.kernel.org/r/1475227569-63446-2-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 181ffd19 03-Oct-2016 Keith Busch <kbusch@kernel.org>

x86/PCI: VMD: Move VMD driver to drivers/pci/host

Move the driver source and Kconfig to the PCI host bridge drivers directory
and move the config option to a more appropriate sub-menu instead of
occupying the top-level location.

Update the Kconfig option with the X86_64 dependency that was implicitly
included from the previous location, and add information about the module
name when built as a loadable module.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Jon Derrick <jonathan.derrick@intel.com>


# cfd8983f 18-May-2016 Peter Zijlstra <peterz@infradead.org>

x86, locking/spinlocks: Remove ticket (spin)lock implementation

We've unconditionally used the queued spinlock for many releases now.

Its time to remove the old ticket lock code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <waiman.long@hpe.com>
Cc: Waiman.Long@hpe.com
Cc: david.vrabel@citrix.com
Cc: dhowells@redhat.com
Cc: pbonzini@redhat.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20160518184302.GO3193@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 58cbbee2 22-Sep-2016 Vadim Pasternak <vadimp@mellanox.com>

x86/platform/mellanox: Introduce support for Mellanox systems platform

Enable system support for the Mellanox Technologies platform, which
provides support for the next Mellanox basic systems: "msx6710",
"msx6720", "msb7700", "msn2700", "msx1410", "msn2410", "msb7800",
"msn2740", "msn2100" and also various number of derivative systems from
the above basic types.

The Kconfig controlling compilation of this code is: MLX_PLATFORM

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Cc: jiri@resnulli.us
Cc: gregkh@linuxfoundation.org
Cc: platform-driver-x86@vger.kernel.org
Cc: geert@linux-m68k.org
Cc: linux@roeck-us.net
Cc: akpm@linux-foundation.org
Cc: mchehab@kernel.org
Cc: davem@davemloft.net
Cc: kvalo@codeaurora.org
Link: http://lkml.kernel.org/r/1474578822-33805-1-git-send-email-vadimp@mellanox.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 15f4eae7 13-Sep-2016 Andy Lutomirski <luto@kernel.org>

x86: Move thread_info into task_struct

Now that most of the thread_info users have been cleaned up,
this is straightforward.

Most of this code was written by Linus.

Originally-from: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a50eab40abeaec9cb9a9e3cbdeafd32190206654.1473801993.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0d025d27 30-Aug-2016 Josh Poimboeuf <jpoimboe@redhat.com>

mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS

There are three usercopy warnings which are currently being silenced for
gcc 4.6 and newer:

1) "copy_from_user() buffer size is too small" compile warning/error

This is a static warning which happens when object size and copy size
are both const, and copy size > object size. I didn't see any false
positives for this one. So the function warning attribute seems to
be working fine here.

Note this scenario is always a bug and so I think it should be
changed to *always* be an error, regardless of
CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.

2) "copy_from_user() buffer size is not provably correct" compile warning

This is another static warning which happens when I enable
__compiletime_object_size() for new compilers (and
CONFIG_DEBUG_STRICT_USER_COPY_CHECKS). It happens when object size
is const, but copy size is *not*. In this case there's no way to
compare the two at build time, so it gives the warning. (Note the
warning is a byproduct of the fact that gcc has no way of knowing
whether the overflow function will be called, so the call isn't dead
code and the warning attribute is activated.)

So this warning seems to only indicate "this is an unusual pattern,
maybe you should check it out" rather than "this is a bug".

I get 102(!) of these warnings with allyesconfig and the
__compiletime_object_size() gcc check removed. I don't know if there
are any real bugs hiding in there, but from looking at a small
sample, I didn't see any. According to Kees, it does sometimes find
real bugs. But the false positive rate seems high.

3) "Buffer overflow detected" runtime warning

This is a runtime warning where object size is const, and copy size >
object size.

All three warnings (both static and runtime) were completely disabled
for gcc 4.6 with the following commit:

2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+")

That commit mistakenly assumed that the false positives were caused by a
gcc bug in __compiletime_object_size(). But in fact,
__compiletime_object_size() seems to be working fine. The false
positives were instead triggered by #2 above. (Though I don't have an
explanation for why the warnings supposedly only started showing up in
gcc 4.6.)

So remove warning #2 to get rid of all the false positives, and re-enable
warnings #1 and #3 by reverting the above commit.

Furthermore, since #1 is a real bug which is detected at compile time,
upgrade it to always be an error.

Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
needed.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e4a744ef 19-Aug-2016 Josh Poimboeuf <jpoimboe@redhat.com>

ftrace: Remove CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST from config

Make HAVE_FUNCTION_GRAPH_FP_TEST a normal define, independent from
kconfig. This removes some config file pollution and simplifies the
checking for the fp test.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2c4e5f05054d6d367f702fd153af7a0109dd5c81.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# e37e43a4 11-Aug-2016 Andy Lutomirski <luto@kernel.org>

x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)

This allows x86_64 kernels to enable vmapped stacks by setting
HAVE_ARCH_VMAP_STACK=y - which enables the CONFIG_VMAP_STACK=y
high level Kconfig option.

There are a couple of interesting bits:

First, x86 lazily faults in top-level paging entries for the vmalloc
area. This won't work if we get a page fault while trying to access
the stack: the CPU will promote it to a double-fault and we'll die.
To avoid this problem, probe the new stack when switching stacks and
forcibly populate the pgd entry for the stack when switching mms.

Second, once we have guard pages around the stack, we'll want to
detect and handle stack overflow.

I didn't enable it on x86_32. We'd need to rework the double-fault
code a bit and I'm concerned about running out of vmalloc virtual
addresses under some workloads.

This patch, by itself, will behave somewhat erratically when the
stack overflows while RSP is still more than a few tens of bytes
above the bottom of the stack. Specifically, we'll get #PF and make
it to no_context and them oops without reliably triggering a
double-fault, and no_context doesn't know about stack overflows.
The next patch will improve that case.

Thank you to Nadav and Brian for helping me pay enough attention to
the SDM to hopefully get this right.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c88f3e2920b18e6cc621d772a04a62c06869037e.1470907718.git.luto@kernel.org
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 5b710f34 23-Jun-2016 Kees Cook <keescook@chromium.org>

x86/uaccess: Enable hardened usercopy

Enables CONFIG_HARDENED_USERCOPY checks on x86. This is done both in
copy_*_user() and __copy_*_user() because copy_*_user() actually calls
down to _copy_*_user() and not __copy_*_user().

Based on code from PaX and grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>


# 0f60a8ef 12-Jul-2016 Kees Cook <keescook@chromium.org>

mm: Implement stack frame object validation

This creates per-architecture function arch_within_stack_frames() that
should validate if a given object is contained by a kernel stack frame.
Initial implementation is on x86.

This is based on code from PaX.

Signed-off-by: Kees Cook <keescook@chromium.org>


# 90397a41 21-Jun-2016 Thomas Garnier <thgarnie@google.com>

x86/mm: Add memory hotplug support for KASLR memory randomization

Add a new option (CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING) to define
the padding used for the physical memory mapping section when KASLR
memory is enabled. It ensures there is enough virtual address space when
CONFIG_MEMORY_HOTPLUG is used. The default value is 10 terabytes. If
CONFIG_MEMORY_HOTPLUG is not used, no space is reserved increasing the
entropy available.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-10-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0483e1fa 21-Jun-2016 Thomas Garnier <thgarnie@google.com>

x86/mm: Implement ASLR for kernel memory regions

Randomizes the virtual address space of kernel memory regions for
x86_64. This first patch adds the infrastructure and does not randomize
any region. The following patches will randomize the physical memory
mapping, vmalloc and vmemmap regions.

This security feature mitigates exploits relying on predictable kernel
addresses. These addresses can be used to disclose the kernel modules
base addresses or corrupt specific structures to elevate privileges
bypassing the current implementation of KASLR. This feature can be
enabled with the CONFIG_RANDOMIZE_MEMORY option.

The order of each memory region is not changed. The feature looks at the
available space for the regions based on different configuration options
and randomizes the base and space between each. The size of the physical
memory mapping is the available physical memory. No performance impact
was detected while testing the feature.

Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. This implementation on the best configuration provides 30,000
possible virtual addresses in average for each memory region. An
additional low memory page is used to ensure each CPU can start with a
PGD aligned virtual address (for realmode).

x86/dump_pagetable was updated to correctly display each region.

Updated documentation on x86_64 memory layout accordingly.

Performance data, after all patches in the series:

Kernbench shows almost no difference (-+ less than 1%):

Before:

Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115) System Time 87.056 (0.456416) Percent CPU 1092.9
(13.892) Context Switches 199805 (3455.33) Sleeps 97907.8 (900.636)

After:

Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053) System Time 87.764 (0.49345) Percent CPU 1095
(12.7715) Context Switches 199036 (4298.1) Sleeps 97681.6 (1031.11)

Hackbench shows 0% difference on average (hackbench 90 repeated 10 times):

attemp,before,after 1,0.076,0.069 2,0.072,0.069 3,0.066,0.066 4,0.066,0.068
5,0.066,0.067 6,0.066,0.069 7,0.067,0.066 8,0.063,0.067 9,0.067,0.065
10,0.068,0.071 average,0.0677,0.0677

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ed9f007e 25-May-2016 Kees Cook <keescook@chromium.org>

x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G

We want the physical address to be randomized anywhere between
16MB and the top of physical memory (up to 64TB).

This patch exchanges the prior slots[] array for the new slot_areas[]
array, and lifts the limitation of KERNEL_IMAGE_SIZE on the physical
address offset for 64-bit. As before, process_e820_entry() walks
memory and populates slot_areas[], splitting on any detected mem_avoid
collisions.

Finally, since the slots[] array and its associated functions are not
needed any more, so they are removed.

Based on earlier patches by Baoquan He.

Originally-from: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1464216334-17200-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d6faca40 01-Jun-2016 Arnd Bergmann <arnd@arndb.de>

rtc: move mc146818 helper functions out-of-line

The mc146818_get_time/mc146818_set_time functions are rather large
inline functions in a global header file and are used in several
drivers and in x86 specific code.

Here we move them into a separate .c file that is compiled whenever
any of the users require it. This also lets us remove the linux/acpi.h
header inclusion from mc146818rtc.h, which in turn avoids some
warnings about duplicate definition of the TRUE/FALSE macros.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>


# 91dda51a 20-Jun-2016 Aleksey Makarov <aleksey.makarov@linaro.org>

ACPI / tables: introduce ARCH_HAS_ACPI_TABLE_UPGRADE

We want to use the table upgrade feature in ARM64.
Introduce a new configuration option that allows that.

Signed-off-by: Aleksey Makarov <aleksey.makarov@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3a495511 27-May-2016 William Breathitt Gray <vilhelm.gray@gmail.com>

isa: Allow ISA-style drivers on modern systems

Several modern devices, such as PC/104 cards, are expected to run on
modern systems via an ISA bus interface. Since ISA is a legacy interface
for most modern architectures, ISA support should remain disabled in
general. Support for ISA-style drivers should be enabled on a per driver
basis.

To allow ISA-style drivers on modern systems, this patch introduces the
ISA_BUS_API and ISA_BUS Kconfig options. The ISA bus driver will now
build conditionally on the ISA_BUS_API Kconfig option, which defaults to
the legacy ISA Kconfig option. The ISA_BUS Kconfig option allows the
ISA_BUS_API Kconfig option to be selected on architectures which do not
enable ISA (e.g. X86_64).

The ISA_BUS Kconfig option is currently only implemented for X86
architectures. Other architectures may have their own ISA_BUS Kconfig
options added as required.

Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 0145071b 02-Jun-2016 Linus Walleij <linus.walleij@linaro.org>

x86: Do away with ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB

This replaces:

- "select ARCH_REQUIRE_GPIOLIB" with "select GPIOLIB" as this can
now be selected directly.

- "select ARCH_WANT_OPTIONAL_GPIOLIB" with no dependency: GPIOLIB
is now selectable by everyone, so we need not declare our
intent to select it.

When ordering the symbols the following rationale was used:
if the selects were in alphabetical order, I moved select GPIOLIB
to be in alphabetical order, but if the selects were not
maintained in alphabetical order, I just replaced
"select ARCH_REQUIRE_GPIOLIB" with "select GPIOLIB".

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Michael Büsch <m@bues.ch>
Link: http://lkml.kernel.org/r/1464870018-8281-1-git-send-email-linus.walleij@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# f5967101 29-May-2016 Borislav Petkov <bp@suse.de>

x86/hweight: Get rid of the special calling convention

People complained about ARCH_HWEIGHT_CFLAGS and how it throws a wrench
into kcov, lto, etc, experimentations.

Add asm versions for __sw_hweight{32,64}() and do explicit saving and
restoring of clobbered registers. This gets rid of the special calling
convention. We get to call those functions on !X86_FEATURE_POPCNT CPUs.

We still need to hardcode POPCNT and register operands as some old gas
versions which we support, do not know about POPCNT.

Btw, remove redundant REX prefix from 32-bit POPCNT because alternatives
can do padding now.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1464605787-20603-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6b90bd4b 23-May-2016 Emese Revfy <re.emese@gmail.com>

GCC plugin infrastructure

This patch allows to build the whole kernel with GCC plugins. It was ported from
grsecurity/PaX. The infrastructure supports building out-of-tree modules and
building in a separate directory. Cross-compilation is supported too.
Currently the x86, arm, arm64 and uml architectures enable plugins.

The directory of the gcc plugins is scripts/gcc-plugins. You can use a file or a directory
there. The plugins compile with these options:
* -fno-rtti: gcc is compiled with this option so the plugins must use it too
* -fno-exceptions: this is inherited from gcc too
* -fasynchronous-unwind-tables: this is inherited from gcc too
* -ggdb: it is useful for debugging a plugin (better backtrace on internal
errors)
* -Wno-narrowing: to suppress warnings from gcc headers (ipa-utils.h)
* -Wno-unused-variable: to suppress warnings from gcc headers (gcc_version
variable, plugin-version.h)

The infrastructure introduces a new Makefile target called gcc-plugins. It
supports all gcc versions from 4.5 to 6.0. The scripts/gcc-plugin.sh script
chooses the proper host compiler (gcc-4.7 can be built by either gcc or g++).
This script also checks the availability of the included headers in
scripts/gcc-plugins/gcc-common.h.

The gcc-common.h header contains frequently included headers for GCC plugins
and it has a compatibility layer for the supported gcc versions.

The gcc-generate-*-pass.h headers automatically generate the registration
structures for GIMPLE, SIMPLE_IPA, IPA and RTL passes.

Note that 'make clean' keeps the *.so files (only the distclean or mrproper
targets clean all) because they are needed for out-of-tree modules.

Based on work created by the PaX Team.

Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michal Marek <mmarek@suse.com>


# 51e68d05 21-May-2016 Linus Torvalds <torvalds@linux-foundation.org>

x86 isa: add back X86_32 dependency on CONFIG_ISA

Commit b3c1be1b789c ("base: isa: Remove X86_32 dependency") made ISA
support available on x86-64 too. That's not right - while there are
some LPC-style devices that might be useful still and be based on
ISA-like IP blocks, that is *not* an excuse to try to enable any random
legacy drivers.

Such drivers should be individually enabled and made to perhaps depend
on ISA_DMA_API instead (which we have continued to support on x86-64).
Or we could add another "ISA_XYZ_API" that we support that doesn't
enable random old drivers that aren't even 64-bit clean nor do we have
any test coverage for.

Turning off ISA will now also turn off some drivers that have been
marked as depending on it as part of this series, and that used to work
on modern platforms.

See for example commits ad7afc38eab3..cc736607c86d, which may also need
to be reverted.

This commit means that the warnings that came in due to enabling ISA
widely are now gone again.

Acked-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 42a0bb3f 20-May-2016 Petr Mladek <pmladek@suse.com>

printk/nmi: generic solution for safe printk in NMI

printk() takes some locks and could not be used a safe way in NMI
context.

The chance of a deadlock is real especially when printing stacks from
all CPUs. This particular problem has been addressed on x86 by the
commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all
CPUs").

The patchset brings two big advantages. First, it makes the NMI
backtraces safe on all architectures for free. Second, it makes all NMI
messages almost safe on all architectures (the temporary buffer is
limited. We still should keep the number of messages in NMI context at
minimum).

Note that there already are several messages printed in NMI context:
WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
handlers. These are not easy to avoid.

This patch reuses most of the code and makes it generic. It is useful
for all messages and architectures that support NMI.

The alternative printk_func is set when entering and is reseted when
leaving NMI context. It queues IRQ work to copy the messages into the
main ring buffer in a safe context.

__printk_nmi_flush() copies all available messages and reset the buffer.
Then we could use a simple cmpxchg operations to get synchronized with
writers. There is also used a spinlock to get synchronized with other
flushers.

We do not longer use seq_buf because it depends on external lock. It
would be hard to make all supported operations safe for a lockless use.
It would be confusing and error prone to make only some operations safe.

The code is put into separate printk/nmi.c as suggested by Steven
Rostedt. It needs a per-CPU buffer and is compiled only on
architectures that call nmi_enter(). This is achieved by the new
HAVE_NMI Kconfig flag.

The are MN10300 and Xtensa architectures. We need to clean up NMI
handling there first. Let's do it separately.

The patch is heavily based on the draft from Peter Zijlstra, see

https://lkml.org/lkml/2015/6/10/327

[arnd@arndb.de: printk-nmi: use %zu format string for size_t]
[akpm@linux-foundation.org: min_t->min - all types are size_t here]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> [arm part]
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5f56a5df 20-May-2016 Jiri Slaby <jirislaby@kernel.org>

exit_thread: remove empty bodies

Define HAVE_EXIT_THREAD for archs which want to do something in
exit_thread. For others, let's define exit_thread as an empty inline.

This is a cleanup before we change the prototype of exit_thread to
accept a task parameter.

[akpm@linux-foundation.org: fix mips]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6077776b 13-May-2016 Daniel Borkmann <daniel@iogearbox.net>

bpf: split HAVE_BPF_JIT into cBPF and eBPF variant

Split the HAVE_BPF_JIT into two for distinguishing cBPF and eBPF JITs.

Current cBPF ones:

# git grep -n HAVE_CBPF_JIT arch/
arch/arm/Kconfig:44: select HAVE_CBPF_JIT
arch/mips/Kconfig:18: select HAVE_CBPF_JIT if !CPU_MICROMIPS
arch/powerpc/Kconfig:129: select HAVE_CBPF_JIT
arch/sparc/Kconfig:35: select HAVE_CBPF_JIT

Current eBPF ones:

# git grep -n HAVE_EBPF_JIT arch/
arch/arm64/Kconfig:61: select HAVE_EBPF_JIT
arch/s390/Kconfig:126: select HAVE_EBPF_JIT if PACK_STACK && HAVE_MARCH_Z196_FEATURES
arch/x86/Kconfig:94: select HAVE_EBPF_JIT if X86_64

Later code also needs this facility to check for eBPF JITs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8ac0fba2 01-May-2016 William Breathitt Gray <vilhelm.gray@gmail.com>

isa: Decouple X86_32 dependency from the ISA Kconfig option

The introduction of the ISA_BUS option blocks the compilation of ISA
drivers on non-x86 platforms. The ISA_BUS configuration option should
not be necessary if the X86_32 dependency can be decoupled from the ISA
configuration option. This patch both removes the ISA_BUS configuration
option entirely and removes the X86_32 dependency from the ISA
configuration option.

Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# e8581e3d 20-Apr-2016 Baoquan He <bhe@redhat.com>

x86/KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET

Currently CONFIG_RANDOMIZE_BASE_MAX_OFFSET is used to limit the maximum
offset for kernel randomization. This limit doesn't need to be a CONFIG
since it is tied completely to KERNEL_IMAGE_SIZE, and will make no sense
once physical and virtual offsets are randomized separately. This patch
removes CONFIG_RANDOMIZE_BASE_MAX_OFFSET and consolidates the Kconfig
help text.

[kees: rewrote changelog, dropped KERNEL_IMAGE_SIZE_DEFAULT, rewrote help]
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 07dc900e 29-Mar-2016 Peter Zijlstra <peterz@infradead.org>

perf/x86: Move Kconfig.perf and other perf configuration bits to events/Kconfig

Ingo says:

"If we do a separate file we should have it in arch/x86/events/Kconfig
(not in arch/x86/Kconfig.perf), and also move some of the other bits,
such as PERF_EVENTS_AMD_POWER?"

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# e633c65a 20-Mar-2016 Kan Liang <kan.liang@intel.com>

x86/perf/intel/uncore: Make the Intel uncore PMU driver modular

By default, the uncore driver will be built into the kernel. If it is
configured as a module, the supported CPU model can be auto loaded.

This patch also cleans up the code of uncore_cpu_init() and
uncore_pci_init().

Based-on-a-patch-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1458462817-2475-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b3c1be1b 22-Jan-2016 William Breathitt Gray <vilhelm.gray@gmail.com>

base: isa: Remove X86_32 dependency

Many motherboards utilize a LPC to ISA bridge in order to decode
ISA-style port-mapped I/O addresses. This is particularly true for
embedded motherboards supporting the PC/104 bus (a bus specification
derived from ISA).

These motherboards are now commonly running 64-bit x86 processors. The
X86_32 dependency should be removed from the ISA bus configuration
option in order to support these newer motherboards.

A new config option, CONFIG_ISA_BUS, is introduced to allow for the
compilation of the ISA bus driver independent of the CONFIG_ISA option.
Devices which communicate via ISA-compatible buses can now be supported
independent of the dependencies of the CONFIG_ISA option.

Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 5c9a8750 22-Mar-2016 Dmitry Vyukov <dvyukov@google.com>

kernel: add kcov code coverage

kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing). Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system. A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/). However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.

kcov does not aim to collect as much coverage as possible. It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g. scheduler, locking).

Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes. Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch). I've
dropped the second mode for simplicity.

This patch adds the necessary support on kernel side. The complimentary
compiler support was added in gcc revision 231296.

We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:

https://github.com/google/syzkaller/wiki/Found-Bugs

We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation". For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.

Why not gcov. Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat. A
typical coverage can be just a dozen of basic blocks (e.g. an invalid
input). In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M). Cost of
kcov depends only on number of executed basic blocks/edges. On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.

kcov exposes kernel PCs and control flow to user-space which is
insecure. But debugfs should not be mapped as user accessible.

Based on a patch by Quentin Casasnovas.

[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Drysdale <drysdale@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c7ab62bf 08-Mar-2016 Huang Rui <ray.huang@amd.com>

perf/x86/amd/power: Add AMD accumulated power reporting mechanism

Introduce an AMD accumlated power reporting mechanism for the Family
15h, Model 60h processor that can be used to calculate the average
power consumed by a processor during a measurement interval. The
feature support is indicated by CPUID Fn8000_0007_EDX[12].

This feature will be implemented both in hwmon and perf. The current
design provides one event to report per package/processor power
consumption by counting each compute unit power value.

Here the gory details of how the computation is done:

* Tsample: compute unit power accumulator sample period
* Tref: the PTSC counter period (PTSC: performance timestamp counter)
* N: the ratio of compute unit power accumulator sample period to the
PTSC period

* Jmax: max compute unit accumulated power which is indicated by
MSR_C001007b[MaxCpuSwPwrAcc]

* Jx/Jy: compute unit accumulated power which is indicated by
MSR_C001007a[CpuSwPwrAcc]

* Tx/Ty: the value of performance timestamp counter which is indicated
by CU_PTSC MSR_C0010280[PTSC]
* PwrCPUave: CPU average power

i. Determine the ratio of Tsample to Tref by executing CPUID Fn8000_0007.
N = value of CPUID Fn8000_0007_ECX[CpuPwrSampleTimeRatio[15:0]].

ii. Read the full range of the cumulative energy value from the new
MSR MaxCpuSwPwrAcc.
Jmax = value returned.

iii. At time x, software reads CpuSwPwrAcc and samples the PTSC.
Jx = value read from CpuSwPwrAcc and Tx = value read from PTSC.

iv. At time y, software reads CpuSwPwrAcc and samples the PTSC.
Jy = value read from CpuSwPwrAcc and Ty = value read from PTSC.

v. Calculate the average power consumption for a compute unit over
time period (y-x). Unit of result is uWatt:

if (Jy < Jx) // Rollover has occurred
Jdelta = (Jy + Jmax) - Jx
else
Jdelta = Jy - Jx
PwrCPUave = N * Jdelta * 1000 / (Ty - Tx)

Simple example:

root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' make -j4
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
CHK include/generated/utsrelease.h
CHK include/generated/timeconst.h
CHK include/generated/bounds.h
CHK include/generated/asm-offsets.h
CALL scripts/checksyscalls.sh
CHK include/generated/compile.h
SKIPPED include/generated/compile.h
Building modules, stage 2.
Kernel: arch/x86/boot/bzImage is ready (#40)
MODPOST 4225 modules

Performance counter stats for 'system wide':

183.44 mWatts power/power-pkg/

341.837270111 seconds time elapsed

root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' sleep 10

Performance counter stats for 'system wide':

0.18 mWatts power/power-pkg/

10.012551815 seconds time elapsed

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: jacob.w.shin@gmail.com
Link: http://lkml.kernel.org/r/1457502306-2559-1-git-send-email-ray.huang@amd.com
[ Fixed the modular build. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# e7e127e3 08-Mar-2016 Bjorn Helgaas <bhelgaas@google.com>

PCI: Include pci/hotplug Kconfig directly from pci/Kconfig

Include pci/hotplug/Kconfig directly from pci/Kconfig, so arches don't
have to source both pci/Kconfig and pci/hotplug/Kconfig.

Note that this effectively adds pci/hotplug/Kconfig to the following
arches, because they already sourced drivers/pci/Kconfig but they
previously did not source drivers/pci/hotplug/Kconfig:

alpha
arm
avr32
frv
m68k
microblaze
mn10300
sparc
unicore32

Inspired-by-patch-from: Bogicevic Sasa <brutallesale@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 5f8fc432 03-Feb-2016 Bogicevic Sasa <brutallesale@gmail.com>

PCI: Include pci/pcie/Kconfig directly from pci/Kconfig

Include pci/pcie/Kconfig directly from pci/Kconfig, so arches don't
have to source both pci/Kconfig and pci/pcie/Kconfig.

Note that this effectively adds pci/pcie/Kconfig to the following
arches, because they already sourced drivers/pci/Kconfig but they
previously did not source drivers/pci/pcie/Kconfig:

alpha
avr32
blackfin
frv
m32r
m68k
microblaze
mn10300
parisc
sparc
unicore32
xtensa

[bhelgaas: changelog, source pci/pcie/Kconfig at top of pci/Kconfig, whitespace]
Signed-off-by: Sasa Bogicevic <brutallesale@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# d4883d5d 28-Feb-2016 Josh Poimboeuf <jpoimboe@redhat.com>

objtool: Enable stack metadata validation on 64-bit x86

Set HAVE_STACK_VALIDATION to enable stack metadata validation for
x86_64.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/cdaeb6914d00a070c0f455cd06989bf3f787a2f6.1456719558.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 9ccaf77c 17-Feb-2016 Kees Cook <keescook@chromium.org>

x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option

This removes the CONFIG_DEBUG_RODATA option and makes it always enabled.

This simplifies the code and also makes it clearer that read-only mapped
memory is just as fundamental a security feature in kernel-space as it is
in user-space.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 66d37570 12-Feb-2016 Dave Hansen <dave.hansen@linux.intel.com>

mm/core, x86/mm/pkeys: Add arch_validate_pkey()

The syscall-level code is passed a protection key and need to
return an appropriate error code if the protection key is bogus.
We will be using this in subsequent patches.

Note that this also begins a series of arch-specific calls that
we need to expose in otherwise arch-independent code. We create
a linux/pkeys.h header where we will put *all* the stubs for
these functions.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210232.774EEAAB@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 284244a9 12-Feb-2016 Dave Hansen <dave.hansen@linux.intel.com>

x86/mm/pkeys: Add Kconfig prompt to existing config option

I don't have a strong opinion on whether we need this or not.
Protection Keys has relatively little code associated with it,
and it is not a heavyweight feature to keep enabled. However,
I can imagine that folks would still appreciate being able to
disable it.

Here's the option if folks want it.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210228.7E79386C@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 63c17fb8 12-Feb-2016 Dave Hansen <dave.hansen@linux.intel.com>

mm/core, x86/mm/pkeys: Store protection bits in high VMA flags

vma->vm_flags is an 'unsigned long', so has space for 32 flags
on 32-bit architectures. The high 32 bits are unused on 64-bit
platforms. We've steered away from using the unused high VMA
bits for things because we would have difficulty supporting it
on 32-bit.

Protection Keys are not available in 32-bit mode, so there is
no concern about supporting this feature in 32-bit mode or on
32-bit CPUs.

This patch carves out 4 bits from the high half of
vma->vm_flags and allows architectures to set config option
to make them available.

Sparse complains about these constants unless we explicitly
call them "UL".

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Valentin Rothberg <valentinrothberg@gmail.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210208.81AF00D5@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 4e7f9df2 10-Feb-2016 Michael S. Tsirkin <mst@redhat.com>

hpet: Drop stale URLs

Looks like the HPET spec at intel.com got moved.
It isn't hard to find so drop the link, just mention
the revision assumed.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1455145462-3877-1-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 35e97790 12-Feb-2016 Dave Hansen <dave.hansen@linux.intel.com>

x86/mm/pkeys: Add Kconfig option

I don't have a strong opinion on whether we need a Kconfig prompt
or not. Protection Keys has relatively little code associated
with it, and it is not a heavyweight feature to keep enabled.
However, I can imagine that folks would still appreciate being
able to disable it.

Note that, with disabled-features.h, the checks in the code
for protection keys are always the same:

cpu_has(c, X86_FEATURE_PKU)

With the config option disabled, this essentially turns into an

We will hide the prompt for now.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210200.DB7055E8@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1ecb4ae5 11-Feb-2016 Andrew Morton <akpm@linux-foundation.org>

arch/x86/Kconfig: CONFIG_X86_UV should depend on CONFIG_EFI

arch/x86/built-in.o: In function `uv_bios_call':
(.text+0xeba00): undefined reference to `efi_call'

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5f9c01aa 02-Feb-2016 Borislav Petkov <bp@suse.de>

x86/microcode: Untangle from BLK_DEV_INITRD

Thomas Voegtle reported that doing oldconfig with a .config which has
CONFIG_MICROCODE enabled but BLK_DEV_INITRD disabled prevents the
microcode loading mechanism from being built.

So untangle it from the BLK_DEV_INITRD dependency so that oldconfig
doesn't turn it off and add an explanatory text to its Kconfig help what
the supported methods for supplying microcode are.

Reported-by: Thomas Voegtle <tv@lio96.de>
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.4
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454499225-21544-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# e1c7e324 20-Jan-2016 Christoph Hellwig <hch@lst.de>

dma-mapping: always provide the dma_map_ops based implementation

Move the generic implementation to <linux/dma-mapping.h> now that all
architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
that everyone supports them.

[valentinrothberg@gmail.com: remove leftovers in Kconfig]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Helge Deller <deller@gmx.de>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c6d30853 20-Jan-2016 Andrey Ryabinin <ryabinin.a.a@gmail.com>

UBSAN: run-time undefined behavior sanity checker

UBSAN uses compile-time instrumentation to catch undefined behavior
(UB). Compiler inserts code that perform certain kinds of checks before
operations that could cause UB. If check fails (i.e. UB detected)
__ubsan_handle_* function called to print error message.

So the most of the work is done by compiler. This patch just implements
ubsan handlers printing errors.

GCC has this capability since 4.9.x [1] (see -fsanitize=undefined
option and its suboptions).
However GCC 5.x has more checkers implemented [2].
Article [3] has a bit more details about UBSAN in the GCC.

[1] - https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Debugging-Options.html
[2] - https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html
[3] - http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/

Issues which UBSAN has found thus far are:

Found bugs:

* out-of-bounds access - 97840cb67ff5 ("netfilter: nfnetlink: fix
insufficient validation in nfnetlink_bind")

undefined shifts:

* d48458d4a768 ("jbd2: use a better hash function for the revoke
table")

* 10632008b9e1 ("clockevents: Prevent shift out of bounds")

* 'x << -1' shift in ext4 -
http://lkml.kernel.org/r/<5444EF21.8020501@samsung.com>

* undefined rol32(0) -
http://lkml.kernel.org/r/<1449198241-20654-1-git-send-email-sasha.levin@oracle.com>

* undefined dirty_ratelimit calculation -
http://lkml.kernel.org/r/<566594E2.3050306@odin.com>

* undefined roundown_pow_of_two(0) -
http://lkml.kernel.org/r/<1449156616-11474-1-git-send-email-sasha.levin@oracle.com>

* [WONTFIX] undefined shift in __bpf_prog_run -
http://lkml.kernel.org/r/<CACT4Y+ZxoR3UjLgcNdUm4fECLMx2VdtfrENMtRRCdgHB2n0bJA@mail.gmail.com>

WONTFIX here because it should be fixed in bpf program, not in kernel.

signed overflows:

* 32a8df4e0b33f ("sched: Fix odd values in effective_load()
calculations")

* mul overflow in ntp -
http://lkml.kernel.org/r/<1449175608-1146-1-git-send-email-sasha.levin@oracle.com>

* incorrect conversion into rtc_time in rtc_time64_to_tm() -
http://lkml.kernel.org/r/<1449187944-11730-1-git-send-email-sasha.levin@oracle.com>

* unvalidated timespec in io_getevents() -
http://lkml.kernel.org/r/<CACT4Y+bBxVYLQ6LtOKrKtnLthqLHcw-BMp3aqP3mjdAvr9FULQ@mail.gmail.com>

* [NOTABUG] signed overflow in ktime_add_safe() -
http://lkml.kernel.org/r/<CACT4Y+aJ4muRnWxsUe1CMnA6P8nooO33kwG-c8YZg=0Xc8rJqw@mail.gmail.com>

[akpm@linux-foundation.org: fix unused local warning]
[akpm@linux-foundation.org: fix __int128 build woes]
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yury Gribov <y.gribov@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3fda5bb4 15-Jan-2016 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

x86/platform/intel-mid: Enable 64-bit build

Intel Tangier SoC is known to have 64-bit dual core CPU. Enable
64-bit build for it.

The kernel has been tested on Intel Edison board:

Linux buildroot 4.4.0-next-20160115+ #25 SMP Fri Jan 15 22:03:19 EET 2016 x86_64 GNU/Linux

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 74
model name : Genuine Intel(R) CPU 4000 @ 500MHz
stepping : 8

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1452888668-147116-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# da48d094 15-Jan-2016 Will Deacon <will@kernel.org>

Kconfig: remove HAVE_LATENCYTOP_SUPPORT

As illustrated by commit a3afe70b83fd ("[S390] latencytop s390
support."), HAVE_LATENCYTOP_SUPPORT is defined by an architecture to
advertise an implementation of save_stack_trace_tsk.

However, as of 9212ddb5eada ("stacktrace: provide save_stack_trace_tsk()
weak alias") a dummy implementation is provided if STACKTRACE=y. Given
that LATENCYTOP already depends on STACKTRACE_SUPPORT and selects
STACKTRACE, we can remove HAVE_LATENCYTOP_SUPPORT altogether.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Helge Deller <deller@gmx.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 185a383a 12-Jan-2016 Keith Busch <kbusch@kernel.org>

x86/PCI: Add driver for Intel Volume Management Device (VMD)

The Intel Volume Management Device (VMD) is a Root Complex Integrated
Endpoint that acts as a host bridge to a secondary PCIe domain. BIOS can
reassign one or more Root Ports to appear within a VMD domain instead of
the primary domain. The immediate benefit is that additional PCIe domains
allow more than 256 buses in a system by letting bus numbers be reused
across different domains.

VMD domains do not define ACPI _SEG, so to avoid domain clashing with host
bridges defining this segment, VMD domains start at 0x10000, which is
greater than the highest possible 16-bit ACPI defined _SEG.

This driver enumerates and enables the domain using the root bus
configuration interface provided by the PCI subsystem. The driver provides
configuration space accessor functions (pci_ops), bus and memory resources,
an MSI IRQ domain with irq_chip implementation, and DMA operations
necessary to use devices through the VMD endpoint's interface.

VMD routes I/O as follows:

1) Configuration Space: BAR 0 ("CFGBAR") of VMD provides the base
address and size for configuration space register access to VMD-owned
root ports. It works similarly to MMCONFIG for extended configuration
space. Bus numbering is independent and does not conflict with the
primary domain.

2) MMIO Space: BARs 2 and 4 ("MEMBAR1" and "MEMBAR2") of VMD provide the
base address, size, and type for MMIO register access. These addresses
are not translated by VMD hardware; they are simply reservations to be
distributed to root ports' memory base/limit registers and subdivided
among devices downstream.

3) DMA: To interact appropriately with an IOMMU, the source ID DMA read
and write requests are translated to the bus-device-function of the VMD
endpoint. Otherwise, DMA operates normally without VMD-specific address
translation.

4) Interrupts: Part of VMD's BAR 4 is reserved for VMD's MSI-X Table and
PBA. MSIs from VMD domain devices and ports are remapped to appear as
if they were issued using one of VMD's MSI-X table entries. Each MSI
and MSI-X address of VMD-owned devices and ports has a special format
where the address refers to specific entries in the VMD's MSI-X table.
As with DMA, the interrupt source ID is translated to VMD's
bus-device-function.

The driver provides its own MSI and MSI-X configuration functions
specific to how MSI messages are used within the VMD domain, and
provides an irq_chip for independent IRQ allocation to relay interrupts
from VMD's interrupt handler to the appropriate device driver's handler.

5) Errors: PCIe error message are intercepted by the root ports normally
(e.g., AER), except with VMD, system errors (i.e., firmware first) are
disabled by default. AER and hotplug interrupts are translated in the
same way as endpoint interrupts.

6) VMD does not support INTx interrupts or IO ports. Devices or drivers
requiring these features should either not be placed below VMD-owned
root ports, or VMD should be disabled by BIOS for such endpoints.

[bhelgaas: add VMD BAR #defines, factor out vmd_cfg_addr(), rework VMD
resource setup, whitespace, changelog]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de> (IRQ-related parts)


# 9e08f57d 14-Jan-2016 Daniel Cashman <dcashman@google.com>

x86: mm: support ARCH_MMAP_RND_BITS

x86: arch_mmap_rnd() uses hard-coded values, 8 for 32-bit and 28 for
64-bit, to generate the random offset for the mmap base address. This
value represents a compromise between increased ASLR effectiveness and
avoiding address-space fragmentation. Replace it with a Kconfig option,
which is sensibly bounded, so that platform developers may choose where
to place this compromise. Keep default values as new minimums.

Signed-off-by: Daniel Cashman <dcashman@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Nick Kralevich <nnk@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Borislav Petkov <bp@suse.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 21266be9 19-Nov-2015 Dan Williams <dan.j.williams@intel.com>

arch: consolidate CONFIG_STRICT_DEVM in lib/Kconfig.debug

Let all the archs that implement devmem_is_allowed() opt-in to a common
definition of CONFIG_STRICT_DEVM in lib/Kconfig.debug.

Cc: Kees Cook <keescook@chromium.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
[heiko: drop 'default y' for s390]
Acked-by: Ingo Molnar <mingo@redhat.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# eebb3e8d 11-Dec-2015 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

ACPI / LPSS: override power state for LPSS DMA device

This is a third approach to workaround long standing issue with LPSS on
BayTrail. First one [1] was reverted since it didn't resolve the issue
comprehensively. Second one [2] was rejected by internal review.

The LPSS DMA controller does not have neither _PS0 nor _PS3 method. Moreover it
can be powered off automatically whenever the last LPSS device goes down. In
case of no power any access to the DMA controller will hang the system. The
behaviour is reproduced on some HP laptops based on Intel BayTrail [3,4] as
well as on ASuS T100TA transformer.

Power on the LPSS island through the registers accessible in a specific way.

[1] http://www.spinics.net/lists/linux-acpi/msg53963.html
[2] https://bugzilla.redhat.com/attachment.cgi?id=1066779&action=diff
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1184273
[4] http://www.spinics.net/lists/dmaengine/msg01514.html

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6e1315fe 07-Dec-2015 Borislav Petkov <bp@suse.de>

x86/cpu: Provide a config option to disable static_cpu_has

This brings .text savings of about ~1.6K when building a tinyconfig. It
is off by default so nothing changes for the default.

Kconfig help text from Josh.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/1449481182-27541-5-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 45e898b7 09-Nov-2015 Waiman Long <Waiman.Long@hpe.com>

locking/pvqspinlock: Collect slowpath lock statistics

This patch enables the accumulation of kicking and waiting related
PV qspinlock statistics when the new QUEUED_LOCK_STAT configuration
option is selected. It also enables the collection of data which
enable us to calculate the kicking and wakeup latencies which have
a heavy dependency on the CPUs being used.

The statistical counters are per-cpu variables to minimize the
performance overhead in their updates. These counters are exported
via the debugfs filesystem under the qlockstat directory. When the
corresponding debugfs files are read, summation and computing of the
required data are then performed.

The measured latencies for different CPUs are:

CPU Wakeup Kicking
--- ------ -------
Haswell-EX 63.6us 7.4us
Westmere-EX 67.6us 9.3us

The measured latencies varied a bit from run-to-run. The wakeup
latency is much higher than the kicking latency.

A sample of statistical counters after system bootup (with vCPU
overcommit) was:

pv_hash_hops=1.00
pv_kick_unlock=1148
pv_kick_wake=1146
pv_latency_kick=11040
pv_latency_wake=194840
pv_spurious_wakeup=7
pv_wait_again=4
pv_wait_head=23
pv_wait_node=1129

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447114167-47185-6-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# fe055896 20-Oct-2015 Borislav Petkov <bp@suse.de>

x86/microcode: Merge the early microcode loader

Merge the early loader functionality into the driver proper. The
diff is huge but logically, it is simply moving code from the
_early.c files into the main driver.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 9a2bc335 20-Oct-2015 Borislav Petkov <bp@suse.de>

x86/microcode: Unmodularize the microcode driver

Make CONFIG_MICROCODE a bool. It was practically a bool already anyway,
since early loader was forcing it to =y.

Regardless, there's no real reason to have something be a module which
gets built-in on the majority of installations out there. And its not
like there's noticeable change in functionality - we still can load late
microcode - just the module glue disappears.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 9d99c712 05-Oct-2015 Christian Melki <christian.melki@t2data.com>

swiotlb: Enable it under x86 PAE

Most distributions end up enabling SWIOTLB already with 32-bit
kernels due to the combination of CONFIG_HYPERVISOR_GUEST|CONFIG_XEN=y
as those end up requiring the SWIOTLB.

However for those that are not interested in virtualization and
run in 32-bit they will discover that: "32-bit PAE 4.2.0 kernel
(no IOMMU code) would hang when writing to my USB disk. The kernel
spews million(-ish messages per sec) to syslog, effectively
"hanging" userspace with my kernel.

Oct 2 14:33:06 voodoochild kernel: [ 223.287447] nommu_map_sg:
overflow 25dcac000+1024 of device mask ffffffff
Oct 2 14:33:06 voodoochild kernel: [ 223.287448] nommu_map_sg:
overflow 25dcac000+1024 of device mask ffffffff
Oct 2 14:33:06 voodoochild kernel: [ 223.287449] nommu_map_sg:
overflow 25dcac000+1024 of device mask ffffffff
... etc ..."

Enabling it makes the problem go away.

N.B. With a6dfa128ce5c414ab46b1d690f7a1b8decb8526d
"config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected"
we also have the important part of the SG macros enabled to make this
work properly - in case anybody wants to backport this patch.

Reported-and-Tested-by: Christian Melki <christian.melki@t2data.com>
Signed-off-by: Christian Melki <christian.melki@t2data.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>


# 3dc33bd3 12-Aug-2015 Kees Cook <keescook@chromium.org>

x86/entry/vsyscall: Add CONFIG to control default

Most modern systems can run with vsyscall=none. In an effort to
provide a way for build-time defaults to lack legacy settings,
this adds a new CONFIG to select the type of vsyscall mapping to
use, similar to the existing "vsyscall" command line parameter.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150813005519.GA11696@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1e642812 05-Sep-2015 Ingo Molnar <mingo@kernel.org>

x86/vm86: Fix the misleading CONFIG_VM86 Kconfig help text

The CONFIG_VM86 Kconfig help text is actively misleading, so fix it:

- Don't mark it 'obsolete' in the text as we'll support the ABI as long as CPUs
support it.

- Qualify the part about software emulation and mention that for some apps you
want a real vm86 mode.

- Don't scare users away from the option, instead explain what it does.

Reported-by: Stas Sergeev <stsp@list.ru>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2965faa5 09-Sep-2015 Dave Young <dyoung@redhat.com>

kexec: split kexec_load syscall from kexec core code

There are two kexec load syscalls, kexec_load another and kexec_file_load.
kexec_file_load has been splited as kernel/kexec_file.c. In this patch I
split kexec_load syscall code to kernel/kexec.c.

And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.

The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use
kexec_load syscall can bypass the checking.

Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.

Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig. Also updated general kernel code with to
kexec_load syscall.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 72b252ae 04-Sep-2015 Mel Gorman <mgorman@suse.de>

mm: send one IPI per CPU to TLB flush all entries after unmapping pages

An IPI is sent to flush remote TLBs when a page is unmapped that was
potentially accesssed by other CPUs. There are many circumstances where
this happens but the obvious one is kswapd reclaiming pages belonging to a
running process as kswapd and the task are likely running on separate
CPUs.

On small machines, this is not a significant problem but as machine gets
larger with more cores and more memory, the cost of these IPIs can be
high. This patch uses a simple structure that tracks CPUs that
potentially have TLB entries for pages being unmapped. When the unmapping
is complete, the full TLB is flushed on the assumption that a refill cost
is lower than flushing individual entries.

Architectures wishing to do this must give the following guarantee.

If a clean page is unmapped and not immediately flushed, the
architecture must guarantee that a write to that linear address
from a CPU with a cached TLB entry will trap a page fault.

This is essentially what the kernel already depends on but the window is
much larger with this patch applied and is worth highlighting. The
architecture should consider whether the cost of the full TLB flush is
higher than sending an IPI to flush each individual entry. An additional
architecture helper called flush_tlb_local is required. It's a trivial
wrapper with some accounting in the x86 case.

The impact of this patch depends on the workload as measuring any benefit
requires both mapped pages co-located on the LRU and memory pressure. The
case with the biggest impact is multiple processes reading mapped pages
taken from the vm-scalability test suite. The test case uses NR_CPU
readers of mapped files that consume 10*RAM.

Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs

4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
Ops lru-file-mmap-read-elapsed 159.62 ( 0.00%) 120.68 ( 24.40%)
Ops lru-file-mmap-read-time_range 30.59 ( 0.00%) 2.80 ( 90.85%)
Ops lru-file-mmap-read-time_stddv 6.70 ( 0.00%) 0.64 ( 90.38%)

4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
User 581.00 611.43
System 5804.93 4111.76
Elapsed 161.03 122.12

This is showing that the readers completed 24.40% faster with 29% less
system CPU time. From vmstats, it is known that the vanilla kernel was
interrupted roughly 900K times per second during the steady phase of the
test and the patched kernel was interrupts 180K times per second.

The impact is lower on a single socket machine.

4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
Ops lru-file-mmap-read-elapsed 25.33 ( 0.00%) 20.38 ( 19.54%)
Ops lru-file-mmap-read-time_range 0.91 ( 0.00%) 1.44 (-58.24%)
Ops lru-file-mmap-read-time_stddv 0.28 ( 0.00%) 0.47 (-65.34%)

4.2.0-rc1 4.2.0-rc1
vanilla flushfull-v7
User 58.09 57.64
System 111.82 76.56
Elapsed 27.29 22.55

It's still a noticeable improvement with vmstat showing interrupts went
from roughly 500K per second to 45K per second.

The patch will have no impact on workloads with no memory pressure or have
relatively few mapped pages. It will have an unpredictable impact on the
workload running on the CPU being flushed as it'll depend on how many TLB
entries need to be refilled and how long that takes. Worst case, the TLB
will be completely cleared of active entries when the target PFNs were not
resident at all.

[sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 96601adb 24-Aug-2015 Dan Williams <dan.j.williams@intel.com>

x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB

Given that a write-back (WB) mapping plus non-temporal stores is
expected to be the most efficient way to access PMEM, update the
definition of ARCH_HAS_PMEM_API to imply arch support for
WB-mapped-PMEM. This is needed as a pre-requisite for adding PMEM to
the direct map and mapping it with struct page.

The above clarification for X86_64 means that memcpy_to_pmem() is
permitted to use the non-temporal arch_memcpy_to_pmem() rather than
needlessly fall back to default_memcpy_to_pmem() when the pcommit
instruction is not available. When arch_memcpy_to_pmem() is not
guaranteed to flush writes out of cache, i.e. on older X86_32
implementations where non-temporal stores may just dirty cache,
ARCH_HAS_PMEM_API is simply disabled.

The default fall back for persistent memory handling remains. Namely,
map it with the WT (write-through) cache-type and hope for the best.

arch_has_pmem_api() is updated to only indicate whether the arch
provides the proper helpers to meet the minimum "writes are visible
outside the cache hierarchy after memcpy_to_pmem() + wmb_pmem()". Code
that cares whether wmb_pmem() actually flushes writes to pmem must now
call arch_has_wmb_pmem() directly.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
[hch: set ARCH_HAS_PMEM_API=n on x86_32]
Reviewed-by: Christoph Hellwig <hch@lst.de>
[toshi: x86_32 compile fixes]
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 67a3e8fe 27-Aug-2015 Ross Zwisler <zwisler@kernel.org>

nd_blk: change aperture mapping from WC to WB

This should result in a pretty sizeable performance gain for reads. For
rough comparison I did some simple read testing using PMEM to compare
reads of write combining (WC) mappings vs write-back (WB). This was
done on a random lab machine.

PMEM reads from a write combining mapping:
# dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000
100000+0 records in
100000+0 records out
409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s

PMEM reads from a write-back mapping:
# dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s

To be able to safely support a write-back aperture I needed to add
support for the "read flush" _DSM flag, as outlined in the DSM spec:

http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

This flag tells the ND BLK driver that it needs to flush the cache lines
associated with the aperture after the aperture is moved but before any
new data is read. This ensures that any stale cache lines from the
previous contents of the aperture will be discarded from the processor
cache, and the new data will be read properly from the DIMM. We know
that the cache lines are clean and will be discarded without any
writeback because either a) the previous aperture operation was a read,
and we never modified the contents of the aperture, or b) the previous
aperture operation was a write and we must have written back the dirtied
contents of the aperture to the DIMM before the I/O was completed.

In order to add support for the "read flush" flag I needed to add a
generic routine to invalidate cache lines, mmio_flush_range(). This is
protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently
only supported on x86.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 7a67832c 18-Aug-2015 Dan Williams <dan.j.williams@intel.com>

libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option

We currently register a platform device for e820 type-12 memory and
register a nvdimm bus beneath it. Registering the platform device
triggers the device-core machinery to probe for a driver, but that
search currently comes up empty. Building the nvdimm-bus registration
into the e820_pmem platform device registration in this way forces
libnvdimm to be built-in. Instead, convert the built-in portion of
CONFIG_X86_PMEM_LEGACY to simply register a platform device and move the
rest of the logic to the driver for e820_pmem, for the following
reasons:

1/ Letting e820_pmem support be a module allows building and testing
libnvdimm.ko changes without rebooting

2/ All the normal policy around modules can be applied to e820_pmem
(unbind to disable and/or blacklisting the module from loading by
default)

3/ Moving the driver to a generic location and converting it to scan
"iomem_resource" rather than "e820.map" means any other architecture can
take advantage of this simple nvdimm resource discovery mechanism by
registering a resource named "Persistent Memory (legacy)"

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 648ed940 12-Aug-2015 Chen, Gong <gong.chen@linux.intel.com>

x86/mce: Provide a lockless memory pool to save error records

printk() is not safe to use in MCE context. Add a lockless
memory allocator pool to save error records in MCE context.
Those records will be issued later, in a printk-safe context.
The idea is inspired by the APEI/GHES driver.

We're very conservative and allocate only two pages for it but
since we're going to use those pages throughout the system's
lifetime, we allocate them statically to avoid early boot time
allocation woes.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
[ Rewrite. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1439396985-12812-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# a5b9e5a2 30-Jul-2015 Andy Lutomirski <luto@kernel.org>

x86/ldt: Make modify_ldt() optional

The modify_ldt syscall exposes a large attack surface and is
unnecessary for modern userspace. Make it optional.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/a605166a771c343fd64802dece77a903507333bd.1438291540.git.luto@kernel.org
[ Made MATH_EMULATION dependent on MODIFY_LDT_SYSCALL. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 5aef51c3 10-Jul-2015 Andy Lutomirski <luto@kernel.org>

x86/kconfig/32: Rename CONFIG_VM86 and default it to 'n'

VM86 is entirely broken if ptrace, syscall auditing, or
NOHZ_FULL is in use. The code is a big undocumented mess, it's
a real PITA to test, and it looks like a big chunk of vm86_32.c
is dead code. It also plays awful games with the entry asm.

No one should be using it anyway. Use DOSBOX or KVM instead.

Let's accelerate its slow death. Remove it from EXPERT and
default it to n. Distros should not enable it. In the unlikely
event that some user needs it, they can easily re-enable it.

While we're at it, rename it to CONFIG_X86_LEGACY_VM86 so that 'make
oldconfig' users will be prompted again. I left CONFIG_VM86 as
an alias to avoid a treewide replacement of the names. We can
clean that up once the current asm and vm86 code churn settles
down.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d29c6cc442d32d4df58849d2f8c89fb39ff88d61.1436542295.git.luto@kernel.org
[ Refined it some more. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 5aaeb5c0 16-Jul-2015 Ingo Molnar <mingo@kernel.org>

x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86

Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.

Also optimize the x86 code a bit by caching task_struct_size.

Acked-and-Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 69711ca1 07-Jul-2015 Sébastien Hinderer <Sebastien.Hinderer@ens-lyon.org>

x86/kconfig: Fix typo in the CONFIG_CMDLINE_BOOL help text

Signed-off-by: Sébastien Hinderer <Sebastien.Hinderer@ens-lyon.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Thibault <Samuel.Thibault@ens-lyon.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 9b54050b 22-Jun-2015 Brian Gerst <brgerst@gmail.com>

x86/compat: Separate ia32 and x32 compat ABIs

The x32 ABI is now independent of the ia32 compat ABI. Common
code is now conditional on CONFIG_COMPAT, but unshared code like
syscall entry, signal handling, and the VDSO are under separate
config options.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1434974121-32575-13-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0c3619ea 22-Jun-2015 Brian Gerst <brgerst@gmail.com>

x86/compat: Clean up HAVE_UID16 config

Merge the 32-bit compat config setting for HAVE_UID16 with the
32-bit native one.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1434974121-32575-12-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 3bead553 22-Jun-2015 Brian Gerst <brgerst@gmail.com>

x86/compat: Define ARCH_WANT_OLD_COMPAT_IPC only for 32-bit compat

x32 does not need CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1434974121-32575-11-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d6f2d75a 01-Jul-2015 Andrey Ryabinin <ryabinin.a.a@gmail.com>

x86/kasan: Move KASAN_SHADOW_OFFSET to the arch Kconfig

KASAN_SHADOW_OFFSET is purely arch specific setting,
so it should be in arch's Kconfig file.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435828178-10975-7-git-send-email-a.ryabinin@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# c1bd55f9 30-Jun-2015 Josh Triplett <josh@joshtriplett.org>

x86: opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit

For 32-bit userspace on a 64-bit kernel, this requires modifying
stub32_clone to actually swap the appropriate arguments to match
CONFIG_CLONE_BACKWARDS, rather than just leaving the C argument for tls
broken.

Patch co-authored by Josh Triplett and Thiago Macieira.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thiago Macieira <thiago.macieira@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3b242c66 30-Jun-2015 Mel Gorman <mgorman@suse.de>

x86: mm: enable deferred struct page initialisation on x86-64

Subject says it all. Other architectures may enable on a case-by-case
basis after auditing early_pfn_to_nid and testing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 61031952 25-Jun-2015 Ross Zwisler <zwisler@kernel.org>

arch, x86: pmem api for ensuring durability of persistent memory updates

Based on an original patch by Ross Zwisler [1].

Writes to persistent memory have the potential to be posted to cpu
cache, cpu write buffers, and platform write buffers (memory controller)
before being committed to persistent media. Provide apis,
memcpy_to_pmem(), wmb_pmem(), and memremap_pmem(), to write data to
pmem and assert that it is durable in PMEM (a persistent linear address
range). A '__pmem' attribute is added so sparse can track proper usage
of pointers to pmem.

This continues the status quo of pmem being x86 only for 4.2, but
reworks to ioremap, and wider implementation of memremap() will enable
other archs in 4.3.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000932.html

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
[djbw: various reworks]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 9f53f9fa 09-Jun-2015 Dan Williams <dan.j.williams@intel.com>

libnvdimm, pmem: add libnvdimm support to the pmem driver

nd_pmem attaches to persistent memory regions and namespaces emitted by
the libnvdimm subsystem, and, same as the original pmem driver, presents
the system-physical-address range as a block device.

The existing e820-type-12 to pmem setup is converted to an nvdimm_bus
that emits an nd_namespace_io device.

Note that the X in 'pmemX' is now derived from the parent region. This
provides some stability to the pmem devices names from boot-to-boot.
The minor numbers are also more predictable by passing 0 to
alloc_disk().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# c8e56d20 04-Jun-2015 Borislav Petkov <bp@suse.de>

x86: Kill CONFIG_X86_HT

In talking to Aravind recently about making certain AMD topology
attributes available to the MCE injection module, it seemed like
that CONFIG_X86_HT thing is more or less superfluous. It is
def_bool y, depends on SMP and gets enabled in the majority of
.configs - distro and otherwise - out there.

So let's kill it and make code behind it depend directly on SMP.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Walter <dwalter@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433436928-31903-18-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6471b825 03-Jun-2015 Ingo Molnar <mingo@kernel.org>

x86/kconfig: Reorganize arch feature Kconfig select's

Peter Zijstra noticed that in arch/x86/Kconfig there are a lot
of X86_{32,64} clauses in the X86 symbol, plus there are a number
of similar selects in the X86_32 and X86_64 config definitions
as well - which all overlap in an inconsistent mess.

So:

- move all select's from X86_32 and X86_64 to the X64 config
option

- sort their names, so that duplications are easier to spot

- align their if clauses, so that they are easier to identify
at a glance - and so that weirdnesses stand out more

No change in functionality:

105 insertions(+)
105 deletions(-)

Originally-from: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150602153027.GU3644@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b01aec9b 21-May-2015 Borislav Petkov <bp@suse.de>

EDAC: Cleanup atomic_scrub mess

So first of all, this atomic_scrub() function's naming is bad. It looks
like an atomic_t helper. Change it to edac_atomic_scrub().

The bigger problem is that this function is arch-specific and every new
arch which doesn't necessarily need that functionality still needs to
define it, otherwise EDAC doesn't compile.

So instead of doing that and including arch-specific headers, have each
arch define an EDAC_ATOMIC_SCRUB symbol which can be used in edac_mc.c
for ifdeffery. Much cleaner.

And we already are doing this with another symbol - EDAC_SUPPORT. This
is also much cleaner than having CONFIG_EDAC enumerate all the arches
which need/have EDAC support and drivers.

This way I can kill the useless edac.h header in tile too.

Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Chris Metcalf <cmetcalf@ezchip.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "Maciej W. Rozycki" <macro@codesourcery.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Steven J. Hill" <Steven.Hill@imgtec.com>
Cc: x86@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>


# 10455f64 26-May-2015 Toshi Kani <toshi.kani@hp.com>

x86/mm/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP

Simplify the conditions selecting HAVE_ARCH_HUGE_VMAP since
X86_PAE depends on X86_32 already.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-2-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 039ae585 14-May-2015 Pali Rohár <pali@kernel.org>

hwmon: Allow to compile dell-smm-hwmon driver without /proc/i8k

This patch splits CONFIG_I8K compile option to SENSORS_DELL_SMM and CONFIG_I8K.
Option SENSORS_DELL_SMM is now used to enable compilation of dell-smm-hwmon
driver and old CONFIG_I8K option to enable /proc/i8k interface in driver.

So this change allows to compile dell-smm-hwmon driver without legacy /proc/i8k
interface which is needed only for old Dell Inspirion models or for userspace
i8kutils package.

For backward compatibility when CONFIG_I8K is enabled then also SENSORS_DELL_SMM
is enabled and so driver dell-smm-hwmon (with /proc/i8k) is compiled.

Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# c7114b4e 11-May-2015 Waiman Long <Waiman.Long@hp.com>

locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS

To be consistent with the queued spinlocks which use
CONFIG_QUEUED_SPINLOCKS config parameter, the one for the queued
rwlocks is now renamed to CONFIG_QUEUED_RWLOCKS.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431367031-36697-1-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 62c7a1e9 11-May-2015 Ingo Molnar <mingo@kernel.org>

locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS

Valentin Rothberg reported that we use CONFIG_QUEUED_SPINLOCKS
in arch/x86/kernel/paravirt_patch_32.c, while the symbol is
called CONFIG_QUEUED_SPINLOCK. (Note the extra 'S')

But the typo was natural: the proper English term for such
a generic object would be 'queued spinlocks' - so rename
this and related symbols accordingly to the plural form.

Reported-by: Valentin Rothberg <valentinrothberg@gmail.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# cad14bb9 08-May-2015 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/kconfig: Fix the CONFIG_NR_CPUS description

Since:

b53b5eda8194 ("x86/cpu: Increase max CPU count to 8192")

... the maximum supported NR_CPUS for CPUMASK_OFFSTACK case
is 8192. Let's adjust the description to reflect the change.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431080726-2490-1-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# c5c19941 08-May-2015 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/kconfig: Bump default NR_CPUS from 8 to 64 for 64-bit configuration

Default NR_CPUS==8 is not enough to cover high-end desktop
configuration: Haswell-E has upto 16 threads.

Let's increase default NR_CPUS to 64 on 64-bit configuration.
With this value CPU bitmask will still fit into one unsigned long.

Default for 32-bit configuration is still 8: it's unlikely
anybody will run 32-bit kernels on modern hardware.

As an alternative we could bump NR_CPUS to 128 to cover all
dual-processor servers with some margin.

For reference: Debian and Suse build their kernels with
NR_CPUS==512, Fedora -- 1024.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431080745-19792-1-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f233f7f1 24-Apr-2015 Peter Zijlstra (Intel) <peterz@infradead.org>

locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching

We use the regular paravirt call patching to switch between:

native_queued_spin_lock_slowpath() __pv_queued_spin_lock_slowpath()
native_queued_spin_unlock() __pv_queued_spin_unlock()

We use a callee saved call for the unlock function which reduces the
i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions
again.

We further optimize the unlock path by patching the direct call with a
"movb $0,%arg1" if we are indeed using the native unlock code. This
makes the unlock code almost as fast as the !PARAVIRT case.

This significantly lowers the overhead of having
CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-10-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d73a3397 24-Apr-2015 Waiman Long <Waiman.Long@hp.com>

locking/qspinlock, x86: Enable x86-64 to use queued spinlocks

This patch makes the necessary changes at the x86 architecture
specific layer to enable the use of queued spinlocks for x86-64. As
x86-32 machines are typically not multi-socket. The benefit of queue
spinlock may not be apparent. So queued spinlocks are not enabled.

Currently, there is some incompatibilities between the para-virtualized
spinlock code (which hard-codes the use of ticket spinlock) and the
queued spinlocks. Therefore, the use of queued spinlocks is disabled
when the para-virtualized spinlock is enabled.

The arch/x86/include/asm/qspinlock.h header file includes some x86
specific optimization which will make the queueds spinlock code
perform better than the generic implementation.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-3-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1222e564 05-May-2015 Ingo Molnar <mingo@kernel.org>

x86/platform/uv: Make SGI UV dependent on CONFIG_PCI

Recent PCI changes stopped exporting PCI constants if !CONFIG_PCI,
which made the UV build fail:

arch/x86/kernel/apic/x2apic_uv_x.c:843:16: error: ‘PCI_VGA_STATE_CHANGE_BRIDGE’ undeclared (first use in this function)
arch/x86/kernel/apic/x2apic_uv_x.c:1023:2: error: implicit declaration of function ‘pci_register_set_vga_state’ [-Werror=implicit-function-declaration]

As it's unlikely that an UV bootup will get far without PCI
enumeration, make the platform Kconfig switch (CONFIG_X86_UV)
depend on CONFIG_PCI=y.

Cc: Robin Holt <holt@sgi.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 19e3d60d 04-May-2015 Jan Kiszka <jan.kiszka@siemens.com>

x86: Let x2APIC support depend on interrupt remapping or guest support

We are able to use x2APIC mode in the absence of interrupt remapping on
certain hypervisors. So it is fine to disable IRQ_REMAP without having
to give up x2APIC support.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Link: http://lkml.kernel.org/r/55479709.4030901@siemens.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# baac1695 13-Apr-2015 Jiang Liu <jiang.liu@linux.intel.com>

x86/irq: Remove GENERIC_IRQ_LEGACY_ALLOC_HWIRQ

There's no user of irq_alloc_hwirqs(), irq_alloc_hwirq(),
irq_free_hwirqs() and irq_free_hwirq() in x86 anymore, so remove
GENERIC_IRQ_LEGACY_ALLOC_HWIRQ and related code.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1428978610-28986-8-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 52f518a3 13-Apr-2015 Jiang Liu <jiang.liu@linux.intel.com>

x86/MSI: Use hierarchical irqdomains to manage MSI interrupts

Enhance MSI code to support hierarchical irqdomains, it helps to make
the architecture more clear.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: iommu@lists.linux-foundation.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1428905519-23704-14-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# b5dc8e6c 13-Apr-2015 Jiang Liu <jiang.liu@linux.intel.com>

x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors

Abstract CPU local APIC as an interrupt controller and create an
irqdomain for it to manage CPU interrupt vectors. It's the base to
enable hierarchical irqdomains on x86 systems.

The final irqdomain hierarchy will look like this:

IOAPIC domain ----|
MSI/MSI-x domain ----> [Interrupt Remapping domain] -> CPU vector domain
HPET_IRQ domain ----| ^
|
DMAR domain ----------------------------------------------|
HT_IRQ domain ----------------------------------------------|

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1428905519-23704-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# a6dfa128 17-Apr-2015 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected

A huge amount of NIC drivers use the DMA API, however if
compiled under 32-bit an very important part of the DMA API can
be ommitted leading to the drivers not working at all
(especially if used with 'swiotlb=force iommu=soft').

As Prashant Sreedharan explains it: "the driver [tg3] uses
DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of
the dma "mapping" and dma_unmap_addr() to get the "mapping"
value. On most of the platforms this is a no-op, but ... with
"iommu=soft and swiotlb=force" this house keeping is required,
... otherwise we pass 0 while calling pci_unmap_/pci_dma_sync_
instead of the DMA address."

As such enable this even when using 32-bit kernels.

Reported-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Prashant Sreedharan <prashant@broadcom.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Chan <mchan@broadcom.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: cascardo@linux.vnet.ibm.com
Cc: david.vrabel@citrix.com
Cc: sanjeevb@broadcom.com
Cc: siva.kallam@broadcom.com
Cc: vyasevich@gmail.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/20150417190448.GA9462@l.oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 4a20799d 14-Apr-2015 Vladimir Murzin <vladimir.murzin@arm.com>

mm: move memtest under mm

Memtest is a simple feature which fills the memory with a given set of
patterns and validates memory contents, if bad memory regions is detected
it reserves them via memblock API. Since memblock API is widely used by
other architectures this feature can be enabled outside of x86 world.

This patch set promotes memtest to live under generic mm umbrella and
enables memtest feature for arm/arm64.

It was reported that this patch set was useful for tracking down an issue
with some errant DMA on an arm64 platform.

This patch (of 6):

There is nothing platform dependent in the core memtest code, so other
platforms might benefit from this feature too.

[linux@roeck-us.net: MEMTEST depends on MEMBLOCK]
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d1fd836d 14-Apr-2015 Kees Cook <keescook@chromium.org>

mm: split ET_DYN ASLR from mmap ASLR

This fixes the "offset2lib" weakness in ASLR for arm, arm64, mips,
powerpc, and x86. The problem is that if there is a leak of ASLR from
the executable (ET_DYN), it means a leak of shared library offset as
well (mmap), and vice versa. Further details and a PoC of this attack
is available here:

http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html

With this patch, a PIE linked executable (ET_DYN) has its own ASLR
region:

$ ./show_mmaps_pie
54859ccd6000-54859ccd7000 r-xp ... /tmp/show_mmaps_pie
54859ced6000-54859ced7000 r--p ... /tmp/show_mmaps_pie
54859ced7000-54859ced8000 rw-p ... /tmp/show_mmaps_pie
7f75be764000-7f75be91f000 r-xp ... /lib/x86_64-linux-gnu/libc.so.6
7f75be91f000-7f75beb1f000 ---p ... /lib/x86_64-linux-gnu/libc.so.6
7f75beb1f000-7f75beb23000 r--p ... /lib/x86_64-linux-gnu/libc.so.6
7f75beb23000-7f75beb25000 rw-p ... /lib/x86_64-linux-gnu/libc.so.6
7f75beb25000-7f75beb2a000 rw-p ...
7f75beb2a000-7f75beb4d000 r-xp ... /lib64/ld-linux-x86-64.so.2
7f75bed45000-7f75bed46000 rw-p ...
7f75bed46000-7f75bed47000 r-xp ...
7f75bed47000-7f75bed4c000 rw-p ...
7f75bed4c000-7f75bed4d000 r--p ... /lib64/ld-linux-x86-64.so.2
7f75bed4d000-7f75bed4e000 rw-p ... /lib64/ld-linux-x86-64.so.2
7f75bed4e000-7f75bed4f000 rw-p ...
7fffb3741000-7fffb3762000 rw-p ... [stack]
7fffb377b000-7fffb377d000 r--p ... [vvar]
7fffb377d000-7fffb377f000 r-xp ... [vdso]

The change is to add a call the newly created arch_mmap_rnd() into the
ELF loader for handling ET_DYN ASLR in a separate region from mmap ASLR,
as was already done on s390. Removes CONFIG_BINFMT_ELF_RANDOMIZE_PIE,
which is no longer needed.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Russell King <linux@arm.linux.org.uk>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Arun Chandran <achandran@mvista.com>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Min-Hua Chen <orca.chen@gmail.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Vineeth Vijayan <vvijayan@mvista.com>
Cc: Jeff Bailey <jeffbailey@google.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Behan Webster <behanw@converseincode.com>
Cc: Ismael Ripoll <iripoll@upv.es>
Cc: Jan-Simon Mller <dl9pf@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2b68f6ca 14-Apr-2015 Kees Cook <keescook@chromium.org>

mm: expose arch_mmap_rnd when available

When an architecture fully supports randomizing the ELF load location,
a per-arch mmap_rnd() function is used to find a randomized mmap base.
In preparation for randomizing the location of ET_DYN binaries
separately from mmap, this renames and exports these functions as
arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE
for describing this feature on architectures that support it
(which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390
already supports a separated ET_DYN ASLR from mmap ASLR without the
ARCH_BINFMT_ELF_RANDOMIZE_PIE logic).

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Russell King <linux@arm.linux.org.uk>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Arun Chandran <achandran@mvista.com>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Min-Hua Chen <orca.chen@gmail.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Vineeth Vijayan <vvijayan@mvista.com>
Cc: Jeff Bailey <jeffbailey@google.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Behan Webster <behanw@converseincode.com>
Cc: Ismael Ripoll <iripoll@upv.es>
Cc: Jan-Simon Mller <dl9pf@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6b637835 14-Apr-2015 Toshi Kani <toshi.kani@hp.com>

x86, mm: support huge KVA mappings on x86

Implement huge KVA mapping interfaces on x86.

On x86, MTRRs can override PAT memory types with a 4KB granularity. When
using a huge page, MTRRs can override the memory type of the huge page,
which may lead a performance penalty. The processor can also behave in an
undefined manner if a huge page is mapped to a memory range that MTRRs
have mapped with multiple different memory types. Therefore, the mapping
code falls back to use a smaller page size toward 4KB when a mapping range
is covered by non-WB type of MTRRs. The WB type of MTRRs has no affect on
the PAT memory types.

pud_set_huge() and pmd_set_huge() call mtrr_type_lookup() to see if a
given range is covered by MTRRs. MTRR_TYPE_WRBACK indicates that the
range is either covered by WB or not covered and the MTRR default value is
set to WB. 0xFF indicates that MTRRs are disabled.

HAVE_ARCH_HUGE_VMAP is selected when X86_64 or X86_32 with X86_PAE is set.
X86_32 without X86_PAE is not supported since such config can unlikey be
benefited from this feature, and there was an issue found in testing.

[fengguang.wu@intel.com: ioremap_pud_capable can be static]
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Robert Elliott <Elliott@hp.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 98233368 14-Apr-2015 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86: expose number of page table levels on Kconfig level

We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ec776ef6 01-Apr-2015 Christoph Hellwig <hch@lst.de>

x86/mm: Add support for the non-standard protected e820 type

Various recent BIOSes support NVDIMMs or ADR using a
non-standard e820 memory type, and Intel supplied reference
Linux code using this type to various vendors.

Wire this e820 table type up to export platform devices for the
pmem driver so that we can use it in Linux.

Based on earlier work from:

Dave Jiang <dave.jiang@intel.com>
Dan Williams <dan.j.williams@intel.com>

Includes fixes for NUMA regions from Boaz Harrosh.

Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-nvdimm@ml01.01.org
Link: http://lkml.kernel.org/r/1427872339-6688-2-git-send-email-hch@lst.de
[ Minor cleanups. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6e0a0ea1 24-Mar-2015 Graeme Gregory <graeme.gregory@linaro.org>

ACPI / sleep: Introduce CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT

ACPI 5.1 does not currently support S states for ARM64 hardware but
ACPI code will call acpi_target_system_state() and acpi_sleep_init()
for device power management, so introduce
CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT and select it for x86 and
ia64 only to make sleep functions available, and also introduce stub
function to allow other drivers to function until S states are defined
for ARM64.

It will be no functional change for x86 and IA64.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Graeme Gregory <graeme.gregory@linaro.org>
Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# d8eb8940 13-Mar-2015 Borislav Petkov <bp@suse.de>

x86/kexec: Cleanup KEXEC_VERIFY_SIG Kconfig help text

Fix typos and also make it simpler without losing the gist of what it says.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1426251877-11415-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 9ab6eb51 05-Mar-2015 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

x86/intel/quark: Select COMMON_CLK

The commit 8bbc2a135b63 ("x86/intel/quark: Add Intel Quark
platform support") introduced a minimal support of Intel Quark
SoC. That allows to use core parts of the SoC. However, the SPI,
I2C, and GPIO drivers can't be selected by kernel configuration
because they depend on COMMON_CLK. The patch adds a COMMON_CLK
selection to the platfrom definition to allow user choose the drivers.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Ong, Boon Leong <boon.leong.ong@intel.com>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Darren Hart <dvhart@linux.intel.com>
Fixes: 8bbc2a135b63 ("x86/intel/quark: Add Intel Quark platform support")
Link: http://lkml.kernel.org/r/1425569044-2867-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 10971ab2 05-Mar-2015 Ingo Molnar <mingo@kernel.org>

x86/mm: Further simplify 1 GB kernel linear mappings handling

It's a bit pointless to allow Kconfig configuration for 1GB kernel
mappings, it's already hidden behind a 'default y' and CONFIG_EXPERT.

Remove this complication and simplify the code by renaming
CONFIG_ENABLE_DIRECT_GBPAGES to CONFIG_X86_DIRECT_GBPAGES and
document the DEBUG_PAGE_ALLOC and KMEMCHECK quirks.

Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: JBeulich@suse.com
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: julia.lawall@lip6.fr
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# e5008abe 04-Mar-2015 Luis R. Rodriguez <mcgrof@suse.com>

x86/mm: Simplify enabling direct_gbpages

direct_gbpages can be force enabled as an early parameter
but not really have taken effect when DEBUG_PAGEALLOC
or KMEMCHECK is enabled. You can also enable direct_gbpages
right now if you have an x86_64 architecture but your CPU
doesn't really have support for this feature. In both cases
PG_LEVEL_1G won't actually be enabled but direct_gbpages is used
in other areas under the assumptions that PG_LEVEL_1G
was set. Fix this by putting together all requirements
which make this feature sensible to enable under, and only
enable both finally flipping on PG_LEVEL_1G and leaving
PG_LEVEL_1G set when this is true.

We only enable this feature then to be possible on sensible
builds defined by the new ENABLE_DIRECT_GBPAGES. If the
CPU has support for it you can either enable this by using
the DIRECT_GBPAGES option or using the early kernel parameter.
If a platform had support for this you can always force disable
it as well.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: JBeulich@suse.com
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: julia.lawall@lip6.fr
Link: http://lkml.kernel.org/r/1425518654-3403-3-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 8bbc2a13 30-Jan-2015 Bryan O'Donoghue <pure.logic@nexus-software.ie>

x86/intel/quark: Add Intel Quark platform support

Add Intel Quark platform support. Quark needs to pull down all
unlocked IMRs to ensure agreement with the EFI memory map post
boot.

This patch adds an entry in Kconfig for Quark as a platform and
makes IMR support mandatory if selected.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Ong, Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Reviewed-by: Andy Shevchenko <andy.schevchenko@gmail.com>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Reviewed-by: Ong, Boon Leong <boon.leong.ong@intel.com>
Cc: dvhart@infradead.org
Link: http://lkml.kernel.org/r/1422635379-12476-3-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 50849eef 05-Feb-2015 Jan Beulich <JBeulich@suse.com>

x86/Kconfig: Simplify X86_UP_APIC handling

We don't really need a helper symbol for that. For one, it's
pointlessly getting set to Y for all configurations (even 64-bit
ones). And then the purpose can be fulfilled by suitably
adjusting X86_UP_APIC: Hide its prompt when PCI_MSI, and default
it to PCI_MSI.

Tested-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/54D39AFC020000780005D684@mail.emea.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b1da1e71 05-Feb-2015 Jan Beulich <JBeulich@suse.com>

x86/Kconfig: Simplify X86_IO_APIC dependencies

Since dependencies are transitive, we don't really need to
repeat those of X86_UP_IOAPIC.

Furthermore avoid the symbol getting entered into .config when
it is off by having the default simply Y and the dependencies
solely handled via the intended for that purpose "depends on".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/54D39BC9020000780005D688@mail.emea.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# e0fd24a3 05-Feb-2015 Jan Beulich <JBeulich@suse.com>

x86/Kconfig: Avoid issuing pointless turned off entries to .config

Settings without prompts shouldn't normally have defaults other
than Y, as otherwise they (a) needlessly enlarge the resulting
.config and (b) if they ever get a prompt added later, the
tracked setting of off will prevent the devloper from then being
prompted for his/her choice when doing an incremental update of
the configuration (make oldconfig).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/54D39CC6020000780005D6AE@mail.emea.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ef7f0d6a 13-Feb-2015 Andrey Ryabinin <ryabinin.a.a@gmail.com>

x86_64: add KASan support

This patch adds arch specific code for kernel address sanitizer.

16TB of virtual addressed used for shadow memory. It's located in range
[ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup
stacks.

At early stage we map whole shadow region with zero page. Latter, after
pages mapped to direct mapping address range we unmap zero pages from
corresponding shadow (see kasan_map_shadow()) and allocate and map a real
shadow memory reusing vmemmap_populate() function.

Also replace __pa with __pa_nodebug before shadow initialized. __pa with
CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr)
__phys_addr is instrumented, so __asan_load could be called before shadow
area initialized.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 92082a88 05-Feb-2015 Ken Xue <Ken.Xue@amd.com>

ACPI: add AMD ACPI2Platform device support for x86 system

This new feature is to interpret AMD specific ACPI device to
platform device such as I2C, UART, GPIO found on AMD CZ and
later chipsets. It based on example intel LPSS. Now, it can
support AMD I2C, UART and GPIO.

Signed-off-by: Ken Xue <Ken.Xue@amd.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 12cf89b5 03-Feb-2015 Josh Poimboeuf <jpoimboe@redhat.com>

livepatch: rename config to CONFIG_LIVEPATCH

Rename CONFIG_LIVE_PATCHING to CONFIG_LIVEPATCH to make the naming of
the config and the code more consistent.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# ba360f887 24-Jan-2015 Thomas Gleixner <tglx@linutronix.de>

x86, init: Fix UP boot regression on x86_64

Commit 30b8b0066caf "init: Get rid of x86isms" broke the UP boot on
x86_64. That happens because CONFIG_UP_LATE_INIT depends on
CONFIG_X86_UP_APIC. X86_UP_APIC is a 32bit only config switch and
therefor not set on 64bit UP builds. As a consequence the UP init of
the local APIC and the IOAPIC is not called, which results in a boot
failure.

Make it depend on !SMP && X86_LOCAL_APIC instead.

Fixes: 30b8b0066caf init: Get rid of x86isms
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 38a1dfda 22-Jan-2015 Bryan O'Donoghue <pure.logic@nexus-software.ie>

x86/apic: Re-enable PCI_MSI support for non-SMP X86_32

Commit 0dbc6078c06bc0 ('x86, build, pci: Fix PCI_MSI build on !SMP')
introduced the dependency that X86_UP_APIC is only available when
PCI_MSI is false. This effectively prevents PCI_MSI support on 32bit
UP systems because it disables both APIC and IO-APIC. But APIC support
is architecturally required for PCI_MSI.

The intention of the patch was to enforce APIC support when PCI_MSI is
enabled, but failed to do so.

Remove the !PCI_MSI dependency from X86_UP_APIC and enforce
X86_UP_APIC when PCI_MSI support is enabled on 32bit UP systems.

[ tglx: Massaged changelog ]

Fixes 0dbc6078c06bc0 'x86, build, pci: Fix PCI_MSI build on !SMP'
Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1421967529-9037-1-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 30b8b006 15-Jan-2015 Thomas Gleixner <tglx@linutronix.de>

init: Get rid of x86isms

The UP local API support can be set up from an early initcall. No need
for horrible hackery in the init code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211703.827943883@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 32b7eb87 19-Jan-2015 Miroslav Benes <mbenes@suse.cz>

livepatch: change ARCH_HAVE_LIVE_PATCHING to HAVE_LIVE_PATCHING

Change ARCH_HAVE_LIVE_PATCHING to HAVE_LIVE_PATCHING in Kconfigs. HAVE_
bools are prevalent there and we should go with the flow.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 83fe27ea 05-Dec-2014 Pranith Kumar <bobby.prani@gmail.com>

rcu: Make SRCU optional by using CONFIG_SRCU

SRCU is not necessary to be compiled by default in all cases. For tinification
efforts not compiling SRCU unless necessary is desirable.

The current patch tries to make compiling SRCU optional by introducing a new
Kconfig option CONFIG_SRCU which is selected when any of the components making
use of SRCU are selected.

If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.

text data bss dec hex filename
2007 0 0 2007 7d7 kernel/rcu/srcu.o

Size of arch/powerpc/boot/zImage changes from

text data bss dec hex filename
831552 64180 23944 919676 e087c arch/powerpc/boot/zImage : before
829504 64180 23952 917636 e0084 arch/powerpc/boot/zImage : after

so the savings are about ~2000 bytes.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]


# b700e7f0 16-Dec-2014 Seth Jennings <sjenning@redhat.com>

livepatch: kernel: add support for live patching

This commit introduces code for the live patching core. It implements
an ftrace-based mechanism and kernel interface for doing live patching
of kernel and kernel module functions.

It represents the greatest common functionality set between kpatch and
kgraft and can accept patches built using either method.

This first version does not implement any consistency mechanism that
ensures that old and new code do not run together. In practice, ~90% of
CVEs are safe to apply in this way, since they simply add a conditional
check. However, any function change that can not execute safely with
the old version of the function can _not_ be safely applied in this
version.

[ jkosina@suse.cz: due to the number of contributions that got folded into
this original patch from Seth Jennings, add SUSE's copyright as well, as
discussed via e-mail ]

Signed-off-by: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 2f600025 27-Oct-2014 Jiang Liu <jiang.liu@linux.intel.com>

x86, irq: Make MSI and HT_IRQ indepenent of X86_IO_APIC

Now we have splitted functions to support MSI and HT_IRQ into vector.c,
and they have no dependency on IOAPIC any more. So change Kconfig files
to make MSI and HT_IRQ independent of X86_IO_APIC.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1414397531-28254-16-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 74afab7a 27-Oct-2014 Jiang Liu <jiang.liu@linux.intel.com>

x86, irq: Move local APIC related code from io_apic.c into vector.c

Create arch/x86/kernel/apic/vector.c to host local APIC related code,
prepare for making MSI/HT_IRQ independent of IOAPIC.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Grant Likely <grant.likely@linaro.org>
Link: http://lkml.kernel.org/r/1414397531-28254-10-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 72e9b5fe 12-Dec-2014 Dave Hansen <dave.hansen@linux.intel.com>

x86, mpx: Give MPX a real config option prompt

Give MPX a real config option. The CPUs that support it (referenced
here):

https://software.intel.com/en-us/forums/topic/402393

are not available publicly yet. Right now only the software emulator
provides MPX for the general public.

[ tglx: Make it default off. There is no point in having it on right
now as no hardware and no proper tooling support are available ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141212183836.2569D58D@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 957e3fac 12-Dec-2014 Riku Voipio <riku.voipio@linaro.org>

gcov: enable GCOV_PROFILE_ALL from ARCH Kconfigs

Following the suggestions from Andrew Morton and Stephen Rothwell,
Dont expand the ARCH list in kernel/gcov/Kconfig. Instead,
define a ARCH_HAS_GCOV_PROFILE_ALL bool which architectures
can enable.

set ARCH_HAS_GCOV_PROFILE_ALL on Architectures where it was
previously allowed + ARM64 which I tested.

Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 57319d80 14-Nov-2014 Qiaowei Ren <qiaowei.ren@intel.com>

x86, mpx: Add MPX-specific mmap interface

We have chosen to perform the allocation of bounds tables in
kernel (See the patch "on-demand kernel allocation of bounds
tables") and to mark these VMAs with VM_MPX.

However, there is currently no suitable interface to actually do
this. Existing interfaces, like do_mmap_pgoff(), have no way to
set a modified ->vm_ops or ->vm_flags and don't hold mmap_sem
long enough to let a caller do it.

This patch wraps mmap_region() and hold mmap_sem long enough to
make the modifications to the VMA which we need.

Also note the 32/64-bit #ifdef in the header. We actually need
to do this at runtime eventually. But, for now, we don't support
running 32-bit binaries on 64-bit kernels. Support for this will
come in later patches.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151827.CE440F67@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# ce5686d4 29-Oct-2014 Peter Zijlstra (Intel) <peterz@infradead.org>

perf/x86: Fix embarrasing typo

Because we're all human and typing sucks..

Fixes: 7fb0f1de49fc ("perf/x86: Fix compile warnings for intel_uncore")
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/n/tip-be0bftjh8yfm4uvmvtf3yi87@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1ad83c85 29-Oct-2014 Andy Lutomirski <luto@amacapital.net>

x86_64,vsyscall: Make vsyscall emulation configurable

This adds CONFIG_X86_VSYSCALL_EMULATION, guarded by CONFIG_EXPERT.
Turning it off completely disables vsyscall emulation, saving ~3.5k
for vsyscall_64.c, 4k for vsyscall_emu_64.S (the fake vsyscall
page), some tiny amount of core mm code that supports a gate area,
and possibly 4k for a wasted pagetable. The latter is because the
vsyscall addresses are misaligned and fit poorly in the fixmap.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/406db88b8dd5f0cbbf38216d11be34bbb43c7eae.1414618407.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 7fb0f1de 24-Oct-2014 Peter Zijlstra <peterz@infradead.org>

perf/x86: Fix compile warnings for intel_uncore

The uncore drivers require PCI and generate compile time warnings when
!CONFIG_PCI.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6a33979d 09-Oct-2014 Mel Gorman <mgorman@suse.de>

mm: remove misleading ARCH_USES_NUMA_PROT_NONE

ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented
_PAGE_NUMA using _PROT_NONE. This saved using an additional PTE bit and
relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting
fault scanner. This was found to be conceptually confusing with a lot of
implicit assumptions and it was asked that an alternative be found.

Commit c46a7c81 "x86: define _PAGE_NUMA by reusing software bits on the
PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE
bits and shrunk the maximum possible swap size but it did not go far
enough. There are no architectures that reuse _PROT_NONE as _PROT_NUMA
but the relics still exist.

This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary
duplication in powerpc vs the generic implementation by defining the types
the core NUMA helpers expected to exist from x86 with their ppc64
equivalent. This necessitated that a PTE bit mask be created that
identified the bits that distinguish present from NUMA pte entries but it
is expected this will only differ between arches based on _PAGE_PROTNONE.
The naming for the generic helpers was taken from x86 originally but ppc64
has types that are equivalent for the purposes of the helper so they are
mapped instead of duplicating code.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ed2226bd 17-Sep-2014 David E. Box <david.e.box@linux.intel.com>

x86/platform/intel/iosf: Add debugfs config option for IOSF

Makes the IOSF sideband available through debugfs. Allows
developers to experiment with using the sideband to provide
debug and analytical tools for units on the SoC.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1411017231-20807-4-git-send-email-david.e.box@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ced3ce76 17-Sep-2014 David E. Box <david.e.box@linux.intel.com>

x86/platform/intel/iosf: Add better description of IOSF driver in config

Adds better description of IOSF driver to determine when it
should be enabled. Also moves the Kconfig option to "Processor
type and features" menu from main configuration menu.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1411017231-20807-3-git-send-email-david.e.box@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 72d93104 13-Sep-2014 Linus Torvalds <torvalds@linux-foundation.org>

Make ARCH_HAS_FAST_MULTIPLIER a real config variable

It used to be an ad-hoc hack defined by the x86 version of
<asm/bitops.h> that enabled a couple of library routines to know whether
an integer multiply is faster than repeated shifts and additions.

This just makes it use the real Kconfig system instead, and makes x86
(which was the only architecture that did this) select the option.

NOTE! Even for x86, this really is kind of wrong. If we cared, we would
probably not enable this for builds optimized for netburst (P4), where
shifts-and-adds are generally faster than multiplies. This patch does
*not* change that kind of logic, though, it is purely a syntactic change
with no code changes.

This was triggered by the fact that we have other places that really
want to know "do I want to expand multiples by constants by hand or
not", particularly the hash generation code.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 74ca317c 29-Aug-2014 Vivek Goyal <vgoyal@redhat.com>

kexec: create a new config option CONFIG_KEXEC_FILE for new syscall

Currently new system call kexec_file_load() and all the associated code
compiles if CONFIG_KEXEC=y. But new syscall also compiles purgatory
code which currently uses gcc option -mcmodel=large. This option seems
to be available only gcc 4.4 onwards.

Hiding new functionality behind a new config option will not break
existing users of old gcc. Those who wish to enable new functionality
will require new gcc. Having said that, I am trying to figure out how
can I move away from using -mcmodel=large but that can take a while.

I think there are other advantages of introducing this new config
option. As this option will be enabled only on x86_64, other arches
don't have to compile generic kexec code which will never be used. This
new code selects CRYPTO=y and CRYPTO_SHA256=y. And all other arches had
to do this for CONFIG_KEXEC. Now with introduction of new config
option, we can remove crypto dependency from other arches.

Now CONFIG_KEXEC_FILE is available only on x86_64. So whereever I had
CONFIG_X86_64 defined, I got rid of that.

For CONFIG_KEXEC_FILE, instead of doing select CRYPTO=y, I changed it to
"depends on CRYPTO=y". This should be safer as "select" is not
recursive.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# aa8e4f22 27-Aug-2014 David E. Box <david.e.box@linux.intel.com>

x86/iosf: Add Kconfig prompt for IOSF_MBI selection

Fixes an error in having the iosf build as 'default m'. On X86 SoC's the iosf
sideband is the only way to access information for some registers, as opposed to
through MSR's on other Intel architectures. While selecting IOSF_MBI is
preferred, it does mean carrying extra code on non-SoC architectures. This
exports the selection to the user, allowing those driver writers to compile out
iosf code if it's not being built.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1409175640-32426-2-git-send-email-david.e.box@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 9def39be 30-Oct-2013 Josh Triplett <josh@joshtriplett.org>

x86: Support compiling out human-friendly processor feature names

The table mapping CPUID bits to human-readable strings takes up a
non-trivial amount of space, and only exists to support /proc/cpuinfo
and a couple of kernel messages. Since programs depend on the format of
/proc/cpuinfo, force inclusion of the table when building with /proc
support; otherwise, support omitting that table to save space, in which
case the kernel messages will print features numerically instead.

In addition to saving 1408 bytes out of vmlinux, this also saves 1373
bytes out of the uncompressed setup code, which contributes directly to
the size of bzImage.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>


# 8e7d8381 08-Aug-2014 Vivek Goyal <vgoyal@redhat.com>

kexec: verify the signature of signed PE bzImage

This is the final piece of the puzzle of verifying kernel image signature
during kexec_file_load() syscall.

This patch calls into PE file routines to verify signature of bzImage. If
signature are valid, kexec_file_load() succeeds otherwise it fails.

Two new config options have been introduced. First one is
CONFIG_KEXEC_VERIFY_SIG. This option enforces that kernel has to be
validly signed otherwise kernel load will fail. If this option is not
set, no signature verification will be done. Only exception will be when
secureboot is enabled. In that case signature verification should be
automatically enforced when secureboot is enabled. But that will happen
when secureboot patches are merged.

Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG. This option
enables signature verification support on bzImage. If this option is not
set and previous one is set, kernel image loading will fail because kernel
does not have support to verify signature of bzImage.

I tested these patches with both "pesign" and "sbsign" signed bzImages.

I used signing_key.priv key and signing_key.x509 cert for signing as
generated during kernel build process (if module signing is enabled).

Used following method to sign bzImage.

pesign
======
- Convert DER format cert to PEM format cert
openssl x509 -in signing_key.x509 -inform DER -out signing_key.x509.PEM -outform
PEM

- Generate a .p12 file from existing cert and private key file
openssl pkcs12 -export -out kernel-key.p12 -inkey signing_key.priv -in
signing_key.x509.PEM

- Import .p12 file into pesign db
pk12util -i /tmp/kernel-key.p12 -d /etc/pki/pesign

- Sign bzImage
pesign -i /boot/vmlinuz-3.16.0-rc3+ -o /boot/vmlinuz-3.16.0-rc3+.signed.pesign
-c "Glacier signing key - Magrathea" -s

sbsign
======
sbsign --key signing_key.priv --cert signing_key.x509.PEM --output
/boot/vmlinuz-3.16.0-rc3+.signed.sbsign /boot/vmlinuz-3.16.0-rc3+

Patch details:

Well all the hard work is done in previous patches. Now bzImage loader
has just call into that code and verify whether bzImage signature are
valid or not.

Also create two config options. First one is CONFIG_KEXEC_VERIFY_SIG.
This option enforces that kernel has to be validly signed otherwise kernel
load will fail. If this option is not set, no signature verification will
be done. Only exception will be when secureboot is enabled. In that case
signature verification should be automatically enforced when secureboot is
enabled. But that will happen when secureboot patches are merged.

Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG. This option
enables signature verification support on bzImage. If this option is not
set and previous one is set, kernel image loading will fail because kernel
does not have support to verify signature of bzImage.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Matt Fleming <matt@console-pimps.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 12db5562 08-Aug-2014 Vivek Goyal <vgoyal@redhat.com>

kexec: load and relocate purgatory at kernel load time

Load purgatory code in RAM and relocate it based on the location.
Relocation code has been inspired by module relocation code and purgatory
relocation code in kexec-tools.

Also compute the checksums of loaded kexec segments and store them in
purgatory.

Arch independent code provides this functionality so that arch dependent
bootloaders can make use of it.

Helper functions are provided to get/set symbol values in purgatory which
are used by bootloaders later to set things like stack and entry point of
second kernel etc.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# de5b56ba 08-Aug-2014 Vivek Goyal <vgoyal@redhat.com>

kernel: build bin2c based on config option CONFIG_BUILD_BIN2C

currently bin2c builds only if CONFIG_IKCONFIG=y. But bin2c will now be
used by kexec too. So make it compilation dependent on CONFIG_BUILD_BIN2C
and this config option can be selected by CONFIG_KEXEC and CONFIG_IKCONFIG.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 308c09f1 08-Aug-2014 Laura Abbott <lauraa@codeaurora.org>

lib/scatterlist: make ARCH_HAS_SG_CHAIN an actual Kconfig

Rather than have architectures #define ARCH_HAS_SG_CHAIN in an
architecture specific scatterlist.h, make it a proper Kconfig option and
use that instead. At same time, remove the header files are are now
mostly useless and just include asm-generic/scatterlist.h.

[sfr@canb.auug.org.au: powerpc files now need asm/dma.h]
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [powerpc]
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7b2a583a 11-Jul-2014 Matt Fleming <matt.fleming@intel.com>

x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub

Without CONFIG_RELOCATABLE the early boot code will decompress the
kernel to LOAD_PHYSICAL_ADDR. While this may have been fine in the BIOS
days, that isn't going to fly with UEFI since parts of the firmware
code/data may be located at LOAD_PHYSICAL_ADDR.

Straying outside of the bounds of the regions we've explicitly requested
from the firmware will cause all sorts of trouble. Bruno reports that
his machine resets while trying to decompress the kernel image.

We already go to great pains to ensure the kernel is loaded into a
suitably aligned buffer, it's just that the address isn't necessarily
LOAD_PHYSICAL_ADDR, because we can't guarantee that address isn't in-use
by the firmware.

Explicitly enforce CONFIG_RELOCATABLE for the EFI boot stub, so that we
can load the kernel at any address with the correct alignment.

Reported-by: Bruno Prémont <bonbons@linux-vserver.org>
Tested-by: Bruno Prémont <bonbons@linux-vserver.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>


# b16d8c23 04-Aug-2014 Matt Fleming <matt.fleming@intel.com>

x86/efi: Fix 3DNow optimization build failure in EFI stub

Building a 32-bit kernel with CONFIG_X86_USE_3DNOW and CONFIG_EFI_STUB
leads to the following build error,

drivers/firmware/efi/libstub/lib.a(efi-stub-helper.o): In function `efi_relocate_kernel':
efi-stub-helper.c:(.text+0xda5): undefined reference to `_mmx_memcpy'

This is due to the fact that the EFI boot stub pulls in the 3DNow
optimized versions of the memcpy() prototype from
arch/x86/include/asm/string_32.h, even though the _mmx_memcpy()
implementation isn't available in the EFI stub.

For now, predicate CONFIG_EFI_STUB on !CONFIG_X86_USE_3DNOW. This is
most definitely a temporary fix. A complete solution will involve
selectively including kernel headers/symbols into the early-boot
execution environment of the EFI boot stub, i.e. something analogous to
the way that the _SETUP symbol is used.

Previous attempts have been made to fix this kind of problem, though
none seem to have ever been merged,

http://lkml.kernel.org/r/20120329104822.GA17233@x1.osrc.amd.com

Clearly, this problem has been around for a long time.

Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1407193939-27813-1-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 93e5eadd 30-Jun-2014 Li, Aubrey <aubrey.li@linux.intel.com>

x86/platform: New Intel Atom SOC power management controller driver

The Power Management Controller (PMC) controls many of the power
management features present in the Atom SoC. This driver provides
a native power off function via PMC PCI IO port.

On some ACPI hardware-reduced platforms(e.g. ASUS-T100), ACPI sleep
registers are not valid so that (*pm_power_off)() is not hooked by
acpi_power_off(). The power off function in this driver is installed
only when pm_power_off is NULL.

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Link: http://lkml.kernel.org/r/53B0FEEA.3010805@linux.intel.com
Signed-off-by: Lejun Zhu <lejun.zhu@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 09ec5442 16-Jul-2014 Thomas Gleixner <tglx@linutronix.de>

clocksource: Move cycle_last validation to core code

The only user of the cycle_last validation is the x86 TSC. In order to
provide NMI safe accessor functions for clock monotonic and
monotonic_raw we need to do that in the core.

We can't do the TSC specific

if (now < cycle_last)
now = cycle_last;

for the other wrapping around clocksources, but TSC has
CLOCKSOURCE_MASK(64) which actually does not mask out anything so if
now is less than cycle_last the subtraction will give a negative
result. So we can check for that in clocksource_delta() and return 0
for that case.

Implement and enable it for x86

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>


# 24e4a8c3 16-Jul-2014 John Stultz <john.stultz@linaro.org>

ktime: Kill non-scalar ktime_t implementation for 2038

The non-scalar ktime_t implementation is basically a timespec
which has to be changed to support dates past 2038 on 32bit
systems.

This patch removes the non-scalar ktime_t implementation, forcing
the scalar s64 nanosecond version on all architectures.

This may have additional performance overhead on some 32bit
systems when converting between ktime_t and timespec structures,
however the majority of 32bit systems (arm and i386) were already
using scalar ktime_t, so no performance regressions will be seen
on those platforms.

On affected platforms, I'm open to finding optimizations, including
avoiding converting to timespecs where possible.

[ tglx: We can now cleanup the ktime_t.tv64 mess, but thats a
different issue and we can throw a coccinelle script at it ]

Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>


# 44a69f61 22-Jul-2014 Tomasz Nowicki <tomasz.nowicki@linaro.org>

acpi, apei, ghes: Make NMI error notification to be GHES architecture extension.

Currently APEI depends on x86 architecture. It is because of NMI hardware
error notification of GHES which is currently supported by x86 only.
However, many other APEI features can be still used perfectly by other
architectures.

This commit adds two symbols:
1. HAVE_ACPI_APEI for those archs which support APEI.
2. HAVE_ACPI_APEI_NMI which is used for NMI code isolation in ghes.c
file. NMI related data and functions are grouped so they can be wrapped
inside one #ifdef section. Appropriate function stubs are provided for
!NMI case.

Note there is no functional changes for x86 due to hard selected
HAVE_ACPI_APEI and HAVE_ACPI_APEI_NMI symbols.

Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>


# 46ba51ea 18-Jul-2014 Hanjun Guo <guohanjun@huawei.com>

ACPI / processor: Introduce ARCH_MIGHT_HAVE_ACPI_PDC

The use of _PDC is deprecated in ACPI 3.0 in favor of _OSC,
as ARM platform is supported only in ACPI 5.0 or higher version,
_PDC will not be used in ARM platform, so make Make _PDC only for
platforms with Intel CPUs.

Introduce ARCH_MIGHT_HAVE_ACPI_PDC and move _PDC related code in
ACPI processor driver into a single file processor_pdc.c, make x86
and ia64 select it when ACPI is enabled.

This patch also use pr_* to replace printk to fix the checkpatch
warning and factor acpi_processor_alloc_pdc() a little bit to
avoid duplicate pr_err() code.

Suggested-by: Robert Richter <rric@kernel.org>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8a1664be 18-Jul-2014 Graeme Gregory <graeme.gregory@linaro.org>

ACPI: add config for BIOS table scan

With the addition of ARM64 that does not have a traditional BIOS to
scan, add a config option which is selected on x86 (ia64 doesn't need
it either, it is EFI/UEFI based system) to do the traditional BIOS
scanning for tables.

Signed-off-by: Graeme Gregory <graeme.gregory@linaro.org>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# fdc841b5 25-Jun-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ftrace: x86: Remove check of obsolete variable function_trace_stop

Nothing sets function_trace_stop to disable function tracing anymore.
Remove the check for it in the arch code.

Link: http://lkml.kernel.org/r/53C54D32.6000000@zytor.com

Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 4badad35 06-Jun-2014 Peter Zijlstra <peterz@infradead.org>

locking/mutex: Disable optimistic spinning on some architectures

The optimistic spin code assumes regular stores and cmpxchg() play nice;
this is found to not be true for at least: parisc, sparc32, tile32,
metag-lock1, arc-!llsc and hexagon.

There is further wreckage, but this in particular seemed easy to
trigger, so blacklist this.

Opt in for known good archs.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: stable@vger.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/20140606175316.GV13930@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 022ee6c5 25-Jun-2014 Ard Biesheuvel <ardb@kernel.org>

efi/x86: Move UEFI Runtime Services wrappers to generic code

In order for other archs (such as arm64) to be able to reuse the virtual
mode function call wrappers, move them to drivers/firmware/efi/runtime-wrappers.c.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>


# d7f3d478 09-Jun-2014 Jiang Liu <jiang.liu@linux.intel.com>

x86, irq: Introduce mechanisms to support dynamically allocate IRQ for IOAPIC

Currently x86 support identity mapping between GSI(IOAPIC pin) and IRQ
number, so continous IRQs at low end are statically allocated to IOAPICs
at boot time. This design causes trouble to support IOAPIC hotplug.

This patch implements basic mechanism to dynamically allocate IRQ on
demand for IOAPIC pins by using irqdomain framework.

It first adds several fields into struct ioapic to support irqdomain.
Then it implements an algorithm to dynamically allocate IRQ number
for IOAPIC pins on demand.

Currently it supports three types of irqdomain:
1) LEGACY: used to support IOAPIC hosting legacy IRQs and building
identity mapping for legacy IRQs. A speical case, we dynamically
allocate IRQ number for IOAPIC pin which has GSI number below
nr_legacy_irqs() but isn't legacy IRQ. This is for backward
compatibility and avoid regression.
2) STRICT: build identity mapping between GSI and IRQ nubmer.
3) DYNAMIC: dynamically allocate IRQ number for IOAPIC pin on demand.

Legacy(ISA) IRQs is not managed by irqdomain because there may be
multiple pins sharing the same IRQ number and current irqdomain only
supports 1:1 mapping between pins and IRQ.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Link: http://lkml.kernel.org/r/1402302011-23642-24-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 6084a6e2 09-Jun-2014 Jiang Liu <jiang.liu@linux.intel.com>

x86: ce4100, irq: Make CE4100 depend on CONFIG_X86_IO_APIC

Intel CE4100 platforms need IOAPIC support becasue some devices are
always connected to the second IOAPIC, so make CONFIG_CE depends on
CONFIG_X86_IO_APIC.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1402302011-23642-18-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 24f2e027 13-Jun-2014 Kees Cook <keescook@chromium.org>

x86, kaslr: boot-time selectable with hibernation

Changes kASLR from being compile-time selectable (blocked by
CONFIG_HIBERNATION), to being boot-time selectable (with hibernation
available by default) via the "kaslr" kernel command line.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# bd01ec1a 03-Feb-2014 Waiman Long <Waiman.Long@hp.com>

x86, locking/rwlocks: Enable qrwlocks on x86

Make x86 use the fair rwlock_t.

Implement the custom queue_write_unlock() for best performance.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
[peterz: near complete rewrite]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
Link: http://lkml.kernel.org/n/tip-r1xuzmdysvuhl3h86n5fbxi7@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2bf01f9f 04-Jun-2014 Cyrill Gorcunov <gorcunov@openvz.org>

mm: x86 pgtable: require X86_64 for soft-dirty tracker

Tracking dirty status on 2 level pages requires very ugly macros and
taking into account how old the machines who can operate without PAE
mode only are, lets drop soft dirty tracker from them for code
simplicity (note I can't drop all the macros from 2 level pages by now
since _PAGE_BIT_PROTNONE and _PAGE_BIT_FILE are still used even without
tracker).

Linus proposed to completely rip off softdirty support on x86-32 (even
with PAE) and since for CRIU we're not planning to support native x86-32
mode, lets do that.

(Softdirty tracker is relatively new feature which is mostly used by
CRIU so I don't expect if such API change would cause problems for
userspace).

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9c5a3621 04-Jun-2014 Akinobu Mita <akinobu.mita@gmail.com>

x86: enable DMA CMA with swiotlb

The DMA Contiguous Memory Allocator support on x86 is disabled when
swiotlb config option is enabled. So DMA CMA is always disabled on
x86_64 because swiotlb is always enabled. This attempts to support for
DMA CMA with enabling swiotlb config option.

The contiguous memory allocator on x86 is integrated in the function
dma_generic_alloc_coherent() which is .alloc callback in nommu_dma_ops
for dma_alloc_coherent().

x86_swiotlb_alloc_coherent() which is .alloc callback in swiotlb_dma_ops
tries to allocate with dma_generic_alloc_coherent() firstly and then
swiotlb_alloc_coherent() is called as a fallback.

The main part of supporting DMA CMA with swiotlb is that changing
x86_swiotlb_free_coherent() which is .free callback in swiotlb_dma_ops
for dma_free_coherent() so that it can distinguish memory allocated by
dma_generic_alloc_coherent() from one allocated by
swiotlb_alloc_coherent() and release it with dma_generic_free_coherent()
which can handle contiguous memory. This change requires making
is_swiotlb_buffer() global function.

This also needs to change .free callback in the dma_map_ops for amd_gart
and sta2x11, because these dma_ops are also using
dma_generic_alloc_coherent().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4468dd76 04-Jun-2014 Mel Gorman <mgorman@suse.de>

x86: require x86-64 for automatic NUMA balancing

32-bit support for NUMA is an oddity on its own but with automatic NUMA
balancing on top there is a reasonable risk that the CPUPID information
cannot be stored in the page flags. This patch removes support for
automatic NUMA support on 32-bit x86.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c177c81e 04-Jun-2014 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

hugetlb: restrict hugepage_migration_support() to x86_64

Currently hugepage migration is available for all archs which support
pmd-level hugepage, but testing is done only for x86_64 and there're
bugs for other archs. So to avoid breaking such archs, this patch
limits the availability strictly to x86_64 until developers of other
archs get interested in enabling this feature.

Simply disabling hugepage migration on non-x86_64 archs is not enough to
fix the reported problem where sys_move_pages() hits the BUG_ON() in
follow_page(FOLL_GET), so let's fix this by checking if hugepage
migration is supported in vma_migratable().

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b1ee5441 07-May-2014 Thomas Gleixner <tglx@linutronix.de>

x86: Implement arch_setup/teardown_hwirq()

This is just a cleanup to get rid of the create/destroy_irq variants
which were designed in hell.

The long term solution for x86 is to switch over to irq domains and
cleanup the whole vector allocation mess.

The generic irq_alloc_hwirqs() interface deliberately prevents
multi-MSI vector allocation to further enforce the irq domain
conversion (aside of the desire to support ioapic hotplug).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20140507154334.482904047@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 9b17aeec 12-May-2014 Alan <alan@acox1-desk.ger.corp.intel.com>

goldfish: Allow 64bit builds

We can now enable the 64bit option for the Goldfish 64bit emulator.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 6b8f0c87 09-May-2014 David E. Box <david.e.box@linux.intel.com>

x86, iosf: Make IOSF driver modular and usable by more drivers

Currently drivers that run on non-IOSF systems (Core/Xeon) can't use the IOSF
driver on SOC's without selecting it which forces an unnecessary and limiting
dependency. Provides dummy functions to allow these modules to conditionally
use the driver on IOSF equipped platforms without impacting their ability to
compile and load on non-IOSF platforms. Build default m to ensure availability
on x86 SOC's.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1399668248-24199-2-git-send-email-david.e.box@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 34273f41 04-May-2014 H. Peter Anvin <hpa@zytor.com>

x86, espfix: Make it possible to disable 16-bit support

Embedded systems, which may be very memory-size-sensitive, are
extremely unlikely to ever encounter any 16-bit software, so make it
a CONFIG_EXPERT option to turn off support for any 16-bit software
whatsoever.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com


# 197725de 04-May-2014 H. Peter Anvin <hpa@zytor.com>

x86, espfix: Make espfix64 a Kconfig option, fix UML

Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML
build which had broken due to the non-existence of init_espfix_bsp()
in UML: since UML uses its own Kconfig, this option does not appear in
the UML build.

This also makes it possible to make support for 16-bit segments a
configuration option, for the people who want to minimize the size of
the kernel.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Richard Weinberger <richard@nod.at>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com


# d20642f0 18-Apr-2014 Rob Herring <robh@kernel.org>

x86: move FIX_EARLYCON_MEM kconfig into x86

In preparation to support FIX_EARLYCON_MEM on other arches, make the
option per arch.

Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 5b7c73e0 07-Apr-2014 Mark Salter <msalter@redhat.com>

x86: use generic early_ioremap

Move x86 over to the generic early ioremap implementation.

Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7a017721 25-Feb-2014 AKASHI Takahiro <takahiro.akashi@linaro.org>

audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL

Currently AUDITSYSCALL has a long list of architecture depencency:
depends on AUDIT && (X86 || PARISC || PPC || S390 || IA64 || UML ||
SPARC64 || SUPERH || (ARM && AEABI && !OABI_COMPAT) || ALPHA)
The purpose of this patch is to replace it with HAVE_ARCH_AUDITSYSCALL
for simplicity.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com> (arm)
Acked-by: Richard Guy Briggs <rgb@redhat.com> (audit)
Acked-by: Matt Turner <mattst88@gmail.com> (alpha)
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Eric Paris <eparis@redhat.com>


# d2312e33 17-Mar-2014 Stefani Seibold <stefani@seibold.net>

x86, vdso: Make vsyscall_gtod_data handling x86 generic

This patch move the vsyscall_gtod_data handling out of vsyscall_64.c
into an additonal file vsyscall_gtod.c to make the functionality
available for x86 32 bit kernel.

It also adds a new vsyscall_32.c which setup the VVAR page.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/1395094933-14252-2-git-send-email-stefani@seibold.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# b0b49f26 13-Mar-2014 Andy Lutomirski <luto@amacapital.net>

x86, vdso: Remove compat vdso support

The compat vDSO is a complicated hack that's needed to maintain
compatibility with a small range of glibc versions.

This removes it and replaces it with a much simpler hack: a config
option to disable the 32-bit vDSO by default.

This also changes the default value of CONFIG_COMPAT_VDSO to n --
users configuring kernels from scratch almost certainly want that
choice.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/4bb4690899106eb11430b1186d5cc66ca9d1660c.1394751608.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 7d453eee 10-Jan-2014 Matt Fleming <matt.fleming@intel.com>

x86/efi: Wire up CONFIG_EFI_MIXED

Add the Kconfig option and bump the kernel header version so that boot
loaders can check whether the handover code is available if they want.

The xloadflags field in the bzImage header is also updated to reflect
that the kernel supports both entry points by setting both of
XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
guaranteed to be addressable with 32-bits.

Note that no boot loaders should be using the bits set in xloadflags to
decide which entry point to jump to. The entire scheme is based on the
concept that 32-bit bootloaders always jump to ->handover_offset and
64-bit loaders always jump to ->handover_offset + 512. We set both bits
merely to inform the boot loader that it's safe to use the native
handover offset even if the machine type in the PE/COFF header claims
otherwise.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>


# b5660ba7 25-Feb-2014 H. Peter Anvin <hpa@linux.intel.com>

x86, platforms: Remove NUMAQ

The NUMAQ support seems to be unmaintained, remove it.

Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/n/530CFD6C.7040705@zytor.com


# c5f9ee3d 25-Feb-2014 H. Peter Anvin <hpa@linux.intel.com>

x86, platforms: Remove SGI Visual Workstation

The SGI Visual Workstation seems to be dead; remove support so we
don't have to continue maintaining it.

Cc: Andrey Panin <pazke@donpac.ru>
Cc: Michael Reed <mdr@sgi.com>
Link: http://lkml.kernel.org/r/530CFD6C.7040705@zytor.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 2b9c1f03 08-Feb-2014 Ard Biesheuvel <ardb@kernel.org>

x86: align x86 arch with generic CPU modalias handling

The x86 CPU feature modalias handling existed before it was reimplemented
generically. This patch aligns the x86 handling so that it
(a) reuses some more code that is now generic;
(b) uses the generic format for the modalias module metadata entry, i.e., it
now uses 'cpu:type:x86,venVVVVfamFFFFmodMMMM:feature:,XXXX,YYYY' instead of
the 'x86cpu:vendor:VVVV:family:FFFF:model:MMMM:feature:,XXXX,YYYY' that was
used before.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 7cf6c945 11-Feb-2014 David Rientjes <rientjes@google.com>

x86, apic: Remove support for IBM Summit/EXA chipset

There should no longer be any IBM x440 systems or those using the
Summit/EXA chipset out in the wild, so remove support for it.

We've done our due diligence in reaching out to any contact information
listed for this chipset and no indication was given that it should be
kept around.

Signed-off-by: David Rientjes <rientjes@google.com>


# 58f5d2d4 11-Feb-2014 David Rientjes <rientjes@google.com>

x86, apic: Remove support for ia32-based Unisys ES7000

There should no longer be any ia32-based Unisys ES7000 systems out in
the wild, so remove support for it.

We've done our due diligence in reaching out to any contact information
listed for this system and no indication was given that it should be
kept around.

Signed-off-by: David Rientjes <rientjes@google.com>


# edc6bc78 21-Jan-2014 David Cohen <david.a.cohen@linux.intel.com>

x86/intel/mid: Fix X86_INTEL_MID dependencies

This patch fixes the following warning:

warning: (X86_INTEL_MID) selects INTEL_SCU_IPC which has unmet direct dependencies (X86 && X86_PLATFORM_DEVICES && X86_INTEL_MID)

It happens because when selected, X86_INTEL_MID tries to select
INTEL_SCU_IPC regardless all its dependencies are met or not.

This patch fixes it by adding the missing X86_PLATFORM_DEVICES
dependency to X86_INTEL_MID.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Link: http://lkml.kernel.org/r/1390329699-20782-1-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 4a474157 23-Jan-2014 Robert Graffham <psquid@psquid.net>

Kconfig: update flightly outdated CONFIG_SMP documentation

Remove an outdated reference to "most personal computers" having only one
CPU, and change the use of "singleprocessor" and "single processor" in
CONFIG_SMP's documentation to "uniprocessor" across all arches where that
documentation is present.

Signed-off-by: Robert Graffham <psquid@psquid.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cf074402 23-Jan-2014 Ard Biesheuvel <ardb@kernel.org>

firmware/dmi_scan: generalize for use by other archs

This patch makes a couple of changes to the SMBIOS/DMI scanning
code so it can be used on other archs (such as ARM and arm64):
(a) wrap the calls to ioremap()/iounmap(), this allows the use of a
flavor of ioremap() more suitable for random unaligned access;
(b) allow the non-EFI fallback probe into hardcoded physical address
0xF0000 to be disabled.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Grant Likely <grant.likely@linaro.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b8989db9 20-Jan-2014 Alan <gnomes@lxorguk.ukuu.org.uk>

x86, doc, kconfig: Fix dud URL for Microcode data

The actual data lives in the Intel download center, and that ought to also
be a reliable way to continue to find it. Unfortunately the actual URL
needed for doing it directly is about a foot long so give instructions.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/20140120180056.7173.62222.stgit@alan.etchedpixels.co.uk
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 4cb9b00f 16-Dec-2013 David Cohen <david.a.cohen@linux.intel.com>

x86, intel-mid: Remove deprecated X86_MDFLD and X86_WANT_INTEL_MID configs

We want to support all Intel MID (Mobile Internet Device) platforms
with a single config selection. This patch removes deprecated
CONFIG_X86_MDFLD and X86_WANT_INTEL_MID options in favor of having
CONFIG_X86_INTEL_MID only.

Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Link: http://lkml.kernel.org/r/1387244246-20714-1-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# da2b6fb9 10-Dec-2013 Kees Cook <keescook@chromium.org>

x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET

The help text for RANDOMIZE_BASE_MAX_OFFSET was confusing. This has been
clarified, and updated to be an export-only tunable.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20131210202745.GA2961@www.outflux.net
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# bad5fa63 01-Dec-2013 Borislav Petkov <bp@suse.de>

x86, microcode: Move to a proper location

We've grown a bunch of microcode loader files all prefixed with
"microcode_". They should be under cpu/ because this is strictly
CPU-related functionality so do that and drop the prefix since they're
in their own directory now which gives that prefix. :)

While at it, drop MICROCODE_INTEL_LIB config item and stash the
functionality under CONFIG_MICROCODE_INTEL as it was its only user.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>


# 46184415 08-Jan-2014 David E. Box <david.e.box@linux.intel.com>

arch: x86: New MailBox support driver for Intel SOC's

Current Intel SOC cores use a MailBox Interface (MBI) to provide access to
configuration registers on devices (called units) connected to the system
fabric. This is a support driver that implements access to this interface on
those platforms that can enumerate the device using PCI. Initial support is for
BayTrail, for which port definitons are provided. This is a requirement for
implementing platform specific features (e.g. RAPL driver requires this to
perform platform specific power management using the registers in PUNIT).
Dependant modules should select IOSF_MBI in their respective Kconfig
configuraiton. Serialized access is handled by all exported routines with
spinlocks.

The API includes 3 functions for access to unit registers:

int iosf_mbi_read(u8 port, u8 opcode, u32 offset, u32 *mdr)
int iosf_mbi_write(u8 port, u8 opcode, u32 offset, u32 mdr)
int iosf_mbi_modify(u8 port, u8 opcode, u32 offset, u32 mdr, u32 mask)

port: indicating the unit being accessed
opcode: the read or write port specific opcode
offset: the register offset within the port
mdr: the register data to be read, written, or modified
mask: bit locations in mdr to change

Returns nonzero on error

Note: GPU code handles access to the GFX unit. Therefore access to that unit
with this driver is disallowed to avoid conflicts.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1389216471-734-1-git-send-email-david.e.box@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>


# 5e2c18c0 01-Jan-2014 Mark Salter <msalter@redhat.com>

Input: i8042 - select ARCH_MIGHT_HAVE_PC_SERIO on x86

Architectures which might use an i8042 for serial IO to keyboard,
mouse, etc should select ARCH_MIGHT_HAVE_PC_SERIO.

Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>


# 19952a92 19-Dec-2013 Kees Cook <keescook@chromium.org>

stackprotector: Unify the HAVE_CC_STACKPROTECTOR logic between architectures

Instead of duplicating the CC_STACKPROTECTOR Kconfig and
Makefile logic in each architecture, switch to using
HAVE_CC_STACKPROTECTOR and keep everything in one place. This
retains the x86-specific bug verification scripts.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1387481759-14535-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# be5e610c 18-Nov-2013 Peter Zijlstra <peterz@infradead.org>

math64: Add mul_u64_u32_shr()

Introduce mul_u64_u32_shr() as proposed by Andy a while back; it
allows using 64x64->128 muls on 64bit archs and recent GCC
which defines __SIZEOF_INT128__ and __int128.

(This new method will be used by the scheduler.)

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: fweisbec@gmail.com
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-hxjoeuzmrcaumR0uZwjpe2pv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 5065a706 30-Nov-2013 Masanari Iida <standby24x7@gmail.com>

treewide: Fix typo in Kconfig

Correct spelling typo in Kconfig.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 0a06ff06 14-Nov-2013 Christoph Hellwig <hch@infradead.org>

kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS

We've switched over every architecture that supports SMP to it, so
remove the new useless config variable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9491846f 14-Nov-2013 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86, mm: enable split page table lock for PMD level

Enable PMD split page table lock for X86_64 and PAE.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a653f356 11-Nov-2013 Kees Cook <keescook@chromium.org>

x86, kaslr: Mix entropy sources together as needed

Depending on availability, mix the RDRAND and RDTSC entropy together with
XOR. Only when neither is available should the i8254 be used. Update
the Kconfig documentation to reflect this. Additionally, since bits
used for entropy is masked elsewhere, drop the needless masking in
the get_random_long(). Similarly, use the entire TSC, not just the low
32 bits.

Finally, to improve the starting entropy, do a simple hashing of a
build-time versions string and the boot-time boot_params structure for
some additional level of unpredictability.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20131111222839.GA28616@www.outflux.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# b53b5eda 05-Nov-2013 Josh Boyer <jwboyer@redhat.com>

x86/cpu: Increase max CPU count to 8192

The MAXSMP option is intended to enable silly large numbers of
CPUs for testing purposes. The current value of 4096 isn't very
silly any longer as there are actual SGI machines that approach
6096 CPUs when taking HT into account.

Increase the value to a nice round 8192 to account for this and
allow for short term future increases.

Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: prarit@redhat.com
Cc: Russ Anderson <rja@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131105143816.GK9944@hansolo.jdub.homelinux.org
[ Tweaked it so that MAXSMP simply sets the maximum of the normal range. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# bb61ccc7 05-Nov-2013 Josh Boyer <jwboyer@redhat.com>

x86/cpu: Allow higher NR_CPUS values

The current range for SMP configs is 2 - 512 CPUs, or a full
4096 in the case of MAXSMP. There are machines that have 1024
CPUs in them today and configuring a kernel for that means you
are forced to set MAXSMP. This adds additional unnecessary
overhead. While that overhead might be considered tiny for
large machines, it isn't necessarily so if you are building a
kernel that runs across a wide variety of machines.

To cover the range of more common machines today, we allow
NR_CPUS to be up to 4096 when CPUMASK_OFFSTACK is enabled.

Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: prarit@redhat.com
Cc: Russ Anderson <rja@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131105143728.GJ9944@hansolo.jdub.homelinux.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 4c4e4f61 21-Oct-2013 Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

x86/locking/kconfig: Update paravirt spinlock Kconfig description

Since the paravirt spinlock optimizations went into the v3.12 kernel
we have a very good performance benefit for paravirtualized KVM / Xen
kernels. Also we no longer suffer from 5% side effect on native
kernel that is mentioned in the Kconfig entry.

So update the Kconfig entry accordingly.

pvspinlock benefit on KVM link:

https://lkml.org/lkml/2013/8/6/178

Attilio's tests on native kernel impact:

http://blog.xen.org/index.php/2012/05/11/benchmarking-the-new-pv-ticketlock-implementation/

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: <konrad.wilk@oracle.com>
Cc: <linux@eikelenboom.it>
Cc: <gleb@redhat.com>
Cc: <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1382371508-3843-1-git-send-email-raghavendra.kt@linux.vnet.ibm.com
[ Updated the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 77fbbc81 07-Oct-2013 Mark Salter <msalter@redhat.com>

x86: select ARCH_MIGHT_HAVE_PC_PARPORT

Architectures which support CONFIG_PARPORT_PC should select
ARCH_MIGHT_HAVE_PC_PARPORT.

Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: x86@kernel.org


# 80030e3d 13-Oct-2013 Borislav Petkov <bp@suse.de>

x86/microcode: Correct Kconfig dependencies

I have a randconfig here which has enabled only

CONFIG_MICROCODE=y
CONFIG_MICROCODE_OLD_INTERFACE=y

with both

# CONFIG_MICROCODE_INTEL is not set
# CONFIG_MICROCODE_AMD is not set

off. Which makes building the microcode functionality a little
pointless. Don't do that in such cases then.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1381682189-14470-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6145cfe3 10-Oct-2013 Kees Cook <keescook@chromium.org>

x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64

On 64-bit, this raises the maximum location to -1 GiB (from -1.5 GiB),
the upper limit currently, since the kernel fixmap page mappings need
to be moved to use the other 1 GiB (which would be the theoretical
limit when building with -mcmodel=kernel).

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1381450698-28710-7-git-send-email-keescook@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 8ab3820f 10-Oct-2013 Kees Cook <keescook@chromium.org>

x86, kaslr: Return location from decompress_kernel

This allows decompress_kernel to return a new location for the kernel to
be relocated to. Additionally, enforces CONFIG_PHYSICAL_START as the
minimum relocation position when building with CONFIG_RELOCATABLE.

With CONFIG_RANDOMIZE_BASE set, the choose_kernel_location routine
will select a new location to decompress the kernel, though here it is
presently a no-op. The kernel command line option "nokaslr" is introduced
to bypass these routines.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1381450698-28710-3-git-send-email-keescook@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# ced3c42c 06-Oct-2013 Ingo Molnar <mingo@kernel.org>

x86/iommu: Clean up the CONFIG_GART_IOMMU config option a bit

Improve the explanation of this config option.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/n/tip-gUMmysvsbl3mccbyf6olmxqg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 38901f1c 04-Oct-2013 Andi Kleen <ak@linux.intel.com>

x86/iommu: Don't make AMD_GART depend on EXPERT and default y

The AMD_GART driver was made EXPERT/EMBEDDED a long time
ago to avoid unbootable 64bit systems with 32bit only devices.

This was before swiotlb was there, which does the job
of this fallback today. SWIOTLB is always on, so systems
should always boot.

The drawback is that every system has to compile that
driver in (it cannot be a module).

Also:
- Newer AMD CPUs (the APUs) don't seem to have AMD_GART support
at all anymore.

- Newer AMD platforms have a much better real IOMMU

- The AMD GART driver was never very good (lots of overhead,
e.g. in flushing due to some workarounds) and it's doubtful it's
really better than SWIOTLB.

- On older K8 systems it didn't even work with all chipsets.

- The 32bit device bounce buffer case should be rare/
non performance critical these days anyways.

- On non AMD systems it is not needed at all.

So drop the EXPERT dependency on AMD_GART and remove the
default y. The driver can be still compiled in, just
it's an explicit decision now, and people who don't want
it can unselect it.

I also clarified the description a bit.

This allows to save ~8K text on most modern x86-64 systems.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1380922676-23007-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0dbc6078 03-Oct-2013 Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

x86, build, pci: Fix PCI_MSI build on !SMP

Commit ebd97be635 ('PCI: remove ARCH_SUPPORTS_MSI kconfig option')
removed the ARCH_SUPPORTS_MSI option which architectures could select
to indicate that they support MSI. Now, all architectures are supposed
to build fine when MSI support is enabled: instead of having the
architecture tell *when* MSI support can be used, it's up to the
architecture code to ensure that MSI support can be enabled.

On x86, commit ebd97be635 removed the following line:

select ARCH_SUPPORTS_MSI if (X86_LOCAL_APIC && X86_IO_APIC)

Which meant that MSI support was only available when the local APIC
and I/O APIC were enabled. While this is always true on SMP or x86-64,
it is not necessarily the case on i386 !SMP.

The below patch makes sure that the local APIC and I/O APIC support is
always enabled when MSI support is enabled. To do so, it:

* Ensures the X86_UP_APIC option is not visible when PCI_MSI is
enabled. This is the option that allows, on UP machines, to enable
or not the APIC support. It is already not visible on SMP systems,
or x86-64 systems, for example. We're simply also making it
invisible on i386 MSI systems.

* Ensures that the X86_LOCAL_APIC and X86_IO_APIC options are 'y'
when PCI_MSI is enabled.

Notice that this change requires a change in drivers/iommu/Kconfig to
avoid a recursive Kconfig dependencey. The AMD_IOMMU option selects
PCI_MSI, but was depending on X86_IO_APIC. This dependency is no
longer needed: as soon as PCI_MSI is selected, the presence of
X86_IO_APIC is guaranteed. Moreover, the AMD_IOMMU already depended on
X86_64, which already guaranteed that X86_IO_APIC was enabled, so this
dependency was anyway redundant.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Link: http://lkml.kernel.org/r/1380794354-9079-1-git-send-email-thomas.petazzoni@free-electrons.com
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# a2cd11f7 24-Sep-2013 Frederic Weisbecker <fweisbec@gmail.com>

x86: Tell about irq stack coverage

x86-64 runs irq_exit() under the irq stack. So it can afford
to run softirqs in hardirq exit without the need to switch
the stacks. The hardirq stack is good enough for that.

Now x86-64 runs softirqs in the hardirq stack anyway, so what we
mostly skip is some needless per cpu refcounting updates there.

x86-32 is not concerned because it only runs the irq handler on
the irq stack.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>


# 4172fe2f 22-Sep-2013 Roy Franz <roy.franz@linaro.org>

EFI stub documentation updates

Move efi-stub.txt out of x86 directory and into common directory
in preparation for adding ARM EFI stub support.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>


# 1cad5e9a 29-Aug-2013 Toshi Kani <toshi.kani@hp.com>

hotplug / x86: Disable ARCH_CPU_PROBE_RELEASE on x86

Commit d7c53c9e enabled ARCH_CPU_PROBE_RELEASE on x86 in order to
serialize CPU online/offline operations. Although it is the config
option to enable CPU hotplug test interfaces, probe & release, it is
also the option to enable cpu_hotplug_driver_lock() as well. Therefore,
this option had to be enabled on x86 with dummy arch_cpu_probe() and
arch_cpu_release().

Since then, lock_device_hotplug() was introduced to serialize CPU
online/offline & hotplug operations. Therefore, this config option
is no longer required for the serialization. This patch disables
this config option on x86 and revert the changes made by commit
d7c53c9e.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0f531431 13-Sep-2013 Mathias Nyman <mathias.nyman@linux.intel.com>

x86/intel/lpss: Add pin control support to Intel low power subsystem

x86 chips with LPSS (low power subsystem) such as Lynxpoint and
Baytrail have SoC like peripheral support and controllable pins.

At the moment, Baytrail needs the pinctrl-baytrail driver to let
peripherals control their gpio resources, but more pincontrol
functions such as pin muxing and grouping are possible to add
later.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: http://lkml.kernel.org/r/1379080949-21734-1-git-send-email-mathias.nyman@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0244ad00 30-Aug-2013 Martin Schwidefsky <schwidefsky@de.ibm.com>

Remove GENERIC_HARDIRQ config option

After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# bc08b449 02-Sep-2013 Linus Torvalds <torvalds@linux-foundation.org>

lockref: implement lockless reference count updates using cmpxchg()

Instead of taking the spinlock, the lockless versions atomically check
that the lock is not taken, and do the reference count update using a
cmpxchg() loop. This is semantically identical to doing the reference
count update protected by the lock, but avoids the "wait for lock"
contention that you get when accesses to the reference count are
contended.

Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
Even when the lockref reference counts are updated atomically with
cmpxchg, the fact that they also verify the state of the spinlock means
that the lockless updates can never happen while somebody else holds the
spinlock.

So while "lockref_put_or_lock()" looks a lot like just another name for
"atomic_dec_and_lock()", and both optimize to lockless updates, they are
fundamentally different: the decrement done by atomic_dec_and_lock() is
truly independent of any lock (as long as it doesn't decrement to zero),
so a locked region can still see the count change.

The lockref structure, in contrast, really is a *locked* reference
count. If you hold the spinlock, the reference count will be stable and
you can modify the reference count without using atomics, because even
the lockless updates will see and respect the state of the lock.

In order to enable the cmpxchg lockless code, the architecture needs to
do three things:

(1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
in an aligned u64, and have a "cmpxchg()" implementation that works
on such a u64 data type.

(2) define a helper function to test for a spinlock being unlocked
("arch_spin_value_unlocked()")

(3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
Kconfig file.

This enables it for x86-64 (but not 32-bit, we'd need to make sure
cmpxchg() turns into the proper cmpxchg8b in order to enable it for
32-bit mode).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bf220695 20-Aug-2013 Geert Uytterhoeven <geert@linux-m68k.org>

Kconfig: Remove hotplug enable hints in CONFIG_KEXEC help texts

commit 40b313608ad4ea655addd2ec6cdd106477ae8e15 ("Finally eradicate
CONFIG_HOTPLUG") removed remaining references to CONFIG_HOTPLUG, but missed
a few plain English references in the CONFIG_KEXEC help texts.

Remove them, too.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# ebd97be6 09-Aug-2013 Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

PCI: remove ARCH_SUPPORTS_MSI kconfig option

Now that we have weak versions for each of the PCI MSI architecture
functions, we can actually build the MSI support for all platforms,
regardless of whether they provide or not architecture-specific
versions of those functions. For this reason, the ARCH_SUPPORTS_MSI
hidden kconfig boolean becomes useless, and this patch gets rid of it.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Daniel Price <daniel.price@gmail.com>
Tested-by: Thierry Reding <thierry.reding@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: linux-s390@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David S. Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>


# 1e20eb85 09-Aug-2013 Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

kvm guest: Add configuration support to enable debug information for KVM Guests

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1376058122-8248-14-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 8db73266 09-Aug-2013 Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

x86, ticketlock: Don't inline _spin_unlock when using paravirt spinlocks

The code size expands somewhat, and its better to just call
a function rather than inline it.

Thanks Jeremy for original version of ARCH_NOINLINE_SPIN_UNLOCK config patch,
which is simplified.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1376058122-8248-3-git-send-email-raghavendra.kt@linux.vnet.ibm.com
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# a0215061 08-Jul-2013 Kees Cook <keescook@chromium.org>

x86, relocs: Move ELF relocation handling to C

Moves the relocation handling into C, after decompression. This requires
that the decompressed size is passed to the decompression routine as
well so that relocations can be found. Only kernels that need relocation
support will use the code (currently just x86_32), but this is laying
the ground work for 64-bit using it in support of KASLR.

Based on work by Neill Clift and Michael Davidson.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130708161517.GA4832@www.outflux.net
Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# e3263ab3 02-Aug-2013 David Herrmann <dh.herrmann@gmail.com>

x86: provide platform-devices for boot-framebuffers

The current situation regarding boot-framebuffers (VGA, VESA/VBE, EFI) on
x86 causes troubles when loading multiple fbdev drivers. The global
"struct screen_info" does not provide any state-tracking about which
drivers use the FBs. request_mem_region() theoretically works, but
unfortunately vesafb/efifb ignore it due to quirks for broken boards.

Avoid this by creating a platform framebuffer devices with a pointer
to the "struct screen_info" as platform-data. Drivers can now create
platform-drivers and the driver-core will refuse multiple drivers being
active simultaneously.

We keep the screen_info available for backwards-compatibility. Drivers
can be converted in follow-up patches.

Different devices are created for VGA/VESA/EFI FBs to allow multiple
drivers to be loaded on distro kernels. We create:
- "vesa-framebuffer" for VBE/VESA graphics FBs
- "efi-framebuffer" for EFI FBs
- "platform-framebuffer" for everything else
This allows to load vesafb, efifb and others simultaneously and each
picks up only the supported FB types.

Apart from platform-framebuffer devices, this also introduces a
compatibility option for "simple-framebuffer" drivers which recently got
introduced for OF based systems. If CONFIG_X86_SYSFB is selected, we
try to match the screen_info against a simple-framebuffer supported
format. If we succeed, we create a "simple-framebuffer" device instead
of a platform-framebuffer.
This allows to reuse the simplefb.c driver across architectures and also
to introduce a SimpleDRM driver. There is no need to have vesafb.c,
efifb.c, simplefb.c and more just to have architecture specific quirks
in their setup-routines.

Instead, we now move the architecture specific quirks into x86-setup and
provide a generic simple-framebuffer. For backwards-compatibility (if
strange formats are used), we still allow vesafb/efifb to be loaded
simultaneously and pick up all remaining devices.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Link: http://lkml.kernel.org/r/1375445127-15480-4-git-send-email-dh.herrmann@gmail.com
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# a0842b70 19-Jul-2013 Toshi Kani <toshi.kani@hp.com>

mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default

CONFIG_ARCH_MEMORY_PROBE enables the
/sys/devices/system/memory/probe interface, which allows a given
memory address to be hot-added as follows:

# echo start_address_of_new_memory > /sys/devices/system/memory/probe

(See Documentation/memory-hotplug.txt for more details.)

This probe interface is required on powerpc. On x86, however,
ACPI notifies a memory hotplug event to the kernel, which
performs its hotplug operation as the result.

Therefore, regular users do not need this interface on x86. This probe
interface is also error-prone and misleading that the kernel blindly
adds a given memory address without checking if the memory is present
on the system; no probing is done despite of its name.

The kernel crashes when a user requests to online a memory block
that is not present on the system. This interface is currently
used for testing as it can fake a hotplug event.

This patch disables CONFIG_ARCH_MEMORY_PROBE by default on x86,
adds its Kconfig menu entry on x86, and clarifies its use in
Documentation/ memory-hotplug.txt.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: linux-mm@kvack.org
Cc: dave@sr71.net
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: tangchen@cn.fujitsu.com
Cc: vasilis.liaskovitis@profitbricks.com
Link: http://lkml.kernel.org/r/1374256068-26016-1-git-send-email-toshi.kani@hp.com
[ Edited it slightly. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ea8596bb 18-Jul-2013 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions

Since introducing the text_poke_bp() for all text_poke_smp*()
callers, text_poke_smp*() are now unused. This patch basically
reverts:

3d55cc8a058e ("x86: Add text_poke_smp for SMP cross modifying code")
7deb18dcf047 ("x86: Introduce text_poke_smp_batch() for batch-code modifying")

and related commits.

This patch also fixes a Kconfig dependency issue on STOP_MACHINE
in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Borislav Petkov <bpetkov@suse.de>
Link: http://lkml.kernel.org/r/20130718114753.26675.18714.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f9b493ac 08-Jul-2013 Kyungsik Lee <kyungsik.lee@lge.com>

arm: add support for LZ4-compressed kernel

Integrates the LZ4 decompression code to the arm pre-boot code.

Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d1a1dc0b 01-Jul-2013 Dave Hansen <dave@linux.vnet.ibm.com>

consolidate per-arch stack overflow debugging options

Original posting:

http://lkml.kernel.org/r/20121214184202.F54094D9@kernel.stglabs.ibm.com

Several architectures have similar stack debugging config options.
They all pretty much do the same thing, some with slightly
differing help text.

This patch changes the architectures to instead enable a Kconfig
boolean, and then use that boolean in the generic Kconfig.debug
to present the actual menu option. This removes a bunch of
duplication and adds consistency across arches.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com> [for tile]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# fdf90abc 03-Jul-2013 Alexandre Bounine <alexandre.bounine@idt.com>

rapidio: add modular build option for the subsystem core

Add a configuration option to build RapidIO subsystem core code as a
loadable kernel module. Currently this option is available only for
x86-based platforms, with the additional patch for PowerPC planned to be
provided later.

This patch replaces kernel command line parameter "riohdid=" with its
module-specific analog "rapidio.hdid=".

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
Cc: Stef van Os <stef.van.os@Prodrive.nl>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0f8975ec 03-Jul-2013 Pavel Emelyanov <xemul@parallels.com>

mm: soft-dirty bits for user memory changes tracking

The soft-dirty is a bit on a PTE which helps to track which pages a task
writes to. In order to do this tracking one should

1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
2. Wait some time.
3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)

To do this tracking, the writable bit is cleared from PTEs when the
soft-dirty bit is. Thus, after this, when the task tries to modify a
page at some virtual address the #PF occurs and the kernel sets the
soft-dirty bit on the respective PTE.

Note, that although all the task's address space is marked as r/o after
the soft-dirty bits clear, the #PF-s that occur after that are processed
fast. This is so, since the pages are still mapped to physical memory,
and thus all the kernel does is finds this fact out and puts back
writable, dirty and soft-dirty bits on the PTE.

Another thing to note, is that when mremap moves PTEs they are marked
with soft-dirty as well, since from the user perspective mremap modifies
the virtual memory at mremap's new address.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cb7b8023 23-Jun-2013 Ben Hutchings <ben@decadent.org.uk>

x86/platform: Make X86_GOLDFISH depend on X86_EXTENDED_PLATFORM

All non-PC platforms are supposed to be dependent on this
option.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Jun Nakajima <jnakajim@gmail.com>
Link: http://lkml.kernel.org/n/tip-Bcihhqhstm67fchjnkxoiJbu@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d1603990 18-Jun-2013 Randy Dunlap <rdunlap@infradead.org>

x86: fix build error and kconfig for ia32_emulation and binfmt

Fix kconfig warning and build errors on x86_64 by selecting BINFMT_ELF
when COMPAT_BINFMT_ELF is being selected.

warning: (IA32_EMULATION) selects COMPAT_BINFMT_ELF which has unmet direct dependencies (COMPAT && BINFMT_ELF)

fs/built-in.o: In function `elf_core_dump':
compat_binfmt_elf.c:(.text+0x3e093): undefined reference to `elf_core_extra_phdrs'
compat_binfmt_elf.c:(.text+0x3ebcd): undefined reference to `elf_core_extra_data_size'
compat_binfmt_elf.c:(.text+0x3eddd): undefined reference to `elf_core_write_extra_phdrs'
compat_binfmt_elf.c:(.text+0x3f004): undefined reference to `elf_core_write_extra_data'

[ hpa: This was sent to me for -next but it is a low risk build fix ]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: http://lkml.kernel.org/r/51C0B614.5000708@infradead.org
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 53313b2c 30-Apr-2013 Steve Capper <steve.capper@linaro.org>

x86: mm: Remove general hugetlb code from x86.

huge_pte_alloc, huge_pte_offset and follow_huge_p[mu]d have
already been copied over to mm.

This patch removes the x86 copies of these functions and activates
the general ones by enabling:
CONFIG_ARCH_WANT_GENERAL_HUGETLB

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>


# cfe28c5d 29-Apr-2013 Steve Capper <steve.capper@linaro.org>

x86: mm: Remove x86 version of huge_pmd_share.

The huge_pmd_share code has been copied over to mm/hugetlb.c to
make it accessible to other architectures.

Remove the x86 copy of the huge_pmd_share code and enable the
ARCH_WANT_HUGE_PMD_SHARE config flag. That way we reference the
general one.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>


# 40b31360 20-May-2013 Stephen Rothwell <sfr@canb.auug.org.au>

Finally eradicate CONFIG_HOTPLUG

Ever since commit 45f035ab9b8f ("CONFIG_HOTPLUG should be always on"),
it has been basically impossible to build a kernel with CONFIG_HOTPLUG
turned off. Remove all the remaining references to it.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 6b3389ac 31-May-2013 Jacob Shin <jacob.shin@amd.com>

x86, microcode, amd: Fix warnings and errors on with CONFIG_MICROCODE=m

Fix section mismatch warnings on microcode_amd_early.
Compile error occurs when CONFIG_MICROCODE=m, change so that early
loading depends on microcode_core.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/20130531150241.GA12006@jshin-Toonie
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 757885e9 30-May-2013 Jacob Shin <jacob.shin@amd.com>

x86, microcode, amd: Early microcode patch loading support for AMD

Add early microcode patch loading support for AMD.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1369940959-2077-5-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>


# b4f711ee 24-Apr-2013 John Stultz <john.stultz@linaro.org>

time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons

Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
which enables some minor compile time optimization to avoid
uncessary code in mostly the suspend/resume path could cause
problems for userland.

In particular, the dependency for RTC_HCTOSYS on
!ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
twice and simplifies suspend/resume, has the side effect
of causing the /sys/class/rtc/rtcN/hctosys flag to always be
zero, and this flag is commonly used by udev to setup the
/dev/rtc symlink to /dev/rtcN, which can cause pain for
older applications.

While the udev rules could use some work to be less fragile,
breaking userland should strongly be avoided. Additionally
the compile time optimizations are fairly minor, and the code
being optimized is likely to be reworked in the future, so
lets revert this change.

Reported-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: stable <stable@vger.kernel.org> #3.9
Cc: Feng Tang <feng.tang@intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 446f24d1 30-Apr-2013 Stephen Boyd <sboyd@codeaurora.org>

Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS

The help text for this config is duplicated across the x86, parisc, and
s390 Kconfig.debug files. Arnd Bergman noted that the help text was
slightly misleading and should be fixed to state that enabling this
option isn't a problem when using pre 4.4 gcc.

To simplify the rewording, consolidate the text into lib/Kconfig.debug
and modify it there to be more explicit about when you should say N to
this config.

Also, make the text a bit more generic by stating that this option
enables compile time checks so we can cover architectures which emit
warnings vs. ones which emit errors. The details of how an
architecture decided to implement the checks isn't as important as the
concept of compile time checking of copy_from_user() calls.

While we're doing this, remove all the copy_from_user_overflow() code
that's duplicated many times and place it into lib/ so that any
architecture supporting this option can get the function for free.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Helge Deller <deller@gmx.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d190e819 17-Apr-2013 Thomas Gleixner <tglx@linutronix.de>

idle: Remove GENERIC_IDLE_LOOP config switch

All archs are converted over. Remove the config switch and the
fallback code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# f6ce5002 16-Apr-2013 Sergey Vlasov <vsu@altlinux.ru>

x86/Kconfig: Make EFI select UCS2_STRING

The commit "efi: Distinguish between "remaining space" and actually used
space" added usage of ucs2_*() functions to arch/x86/platform/efi/efi.c,
but the only thing which selected UCS2_STRING was EFI_VARS, which is
technically optional and can be built as a module.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>


# 7fd2bf3d 28-Mar-2013 Alexandre Courbot <acourbot@nvidia.com>

Remove GENERIC_GPIO config option

GENERIC_GPIO has been made equivalent to GPIOLIB in architecture code
and all driver code has been switch to depend on GPIOLIB. It is thus
safe to have GENERIC_GPIO removed.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>


# 781b0e87 21-Mar-2013 Paul Bolle <pebolle@tiscali.nl>

idle: Remove unused ARCH_HAS_DEFAULT_IDLE

The Kconfig symbol ARCH_HAS_DEFAULT_IDLE is unused. Commit
a0bfa1373859e9d11dc92561a8667588803e42d8 ("cpuidle: stop
depending on pm_idle") removed the only place were it was
actually used. But it did not remove its Kconfig entries (for sh
and x86). Remove those two entries now.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: Len Brown <len.brown@intel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Link: http://lkml.kernel.org/r/1363869683.1390.134.camel@x61.thuisdomein
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7d1a9417 21-Mar-2013 Thomas Gleixner <tglx@linutronix.de>

x86: Use generic idle loop

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130321215235.486594473@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org


# 3195ef59 13-Feb-2013 Prarit Bhargava <prarit@redhat.com>

x86: Do full rtc synchronization with ntp

Every 11 minutes ntp attempts to update the x86 rtc with the current
system time. Currently, the x86 code only updates the rtc if the system
time is within +/-15 minutes of the current value of the rtc. This
was done originally to avoid setting the RTC if the RTC was in localtime
mode (common with Windows dualbooting). Other architectures do a full
synchronization and now that we have better infrastructure to detect
when the RTC is in localtime, there is no reason that x86 should be
software limited to a 30 minute window.

This patch changes the behavior of the kernel to do a full synchronization
(year, month, day, hour, minute, and second) of the rtc when ntp requests
a synchronization between the system time and the rtc.

I've used the RTC library functions in this patchset as they do all the
required bounds checking.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: x86@kernel.org
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: linux-efi@vger.kernel.org
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
[jstultz: Tweak commit message, fold in build fix found by fengguang
Also add select RTC_LIB to X86, per new dependency, as found by prarit]
Signed-off-by: John Stultz <john.stultz@linaro.org>


# 4febd95a 06-Mar-2013 Stephen Rothwell <sfr@canb.auug.org.au>

Select VIRT_TO_BUS directly where needed

In commit 887cbce0adea ("arch Kconfig: centralise ARCH_NO_VIRT_TO_BUS")
I introduced the config sybmol HAVE_VIRT_TO_BUS and selected that where
needed. I am not sure what I was thinking. Instead, just directly
select VIRT_TO_BUS where it is needed.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6276a074 04-Mar-2013 Borislav Petkov <bp@suse.de>

x86: Make Linux guest support optional

Put all config options needed to run Linux as a guest behind a
CONFIG_HYPERVISOR_GUEST menu so that they don't get built-in by default
but be selectable by the user. Also, make all units which depend on
x86_hyper, depend on this new symbol so that compilation doesn't fail
when CONFIG_HYPERVISOR_GUEST is disabled but those units assume its
presence.

Sort options in the new HYPERVISOR_GUEST menu, adapt config text and
drop redundant select.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1362428421-9244-3-git-send-email-bp@alien8.de
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 2c59cad6 04-Mar-2013 Borislav Petkov <bp@suse.de>

x86, Kconfig: Move PARAVIRT_DEBUG into the paravirt menu

This should be under the PARAVIRT_GUEST menu.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1362428421-9244-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 887cbce0 27-Feb-2013 Stephen Rothwell <sfr@canb.auug.org.au>

arch Kconfig: centralise CONFIG_ARCH_NO_VIRT_TO_BUS

Change it to CONFIG_HAVE_VIRT_TO_BUS and set it in all architecures
that already provide virt_to_bus().

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: H Hartley Sweeten <hartleys@visionengravers.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dd8af076 09-Feb-2013 Len Brown <len.brown@intel.com>

APM idle: register apm_cpu_idle via cpuidle

Update APM to register its local idle routine with cpuidle.

This allows us to stop exporting pm_idle to modules on x86.

The Kconfig sub-option, APM_CPU_IDLE, now depends on on CPU_IDLE.

Compile-tested only.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Jiri Kosina <jkosina@suse.cz>


# d64008a8 25-Nov-2012 Al Viro <viro@zeniv.linux.org.uk>

burying unused conditionals

__ARCH_WANT_SYS_RT_SIGACTION,
__ARCH_WANT_SYS_RT_SIGSUSPEND,
__ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND,
__ARCH_WANT_COMPAT_SYS_SCHED_RR_GET_INTERVAL - not used anymore
CONFIG_GENERIC_{SIGALTSTACK,COMPAT_RT_SIG{ACTION,QUEUEINFO,PENDING,PROCMASK}} -
can be assumed always set.


# 5b3eb3ad 25-Dec-2012 Al Viro <viro@zeniv.linux.org.uk>

x86: switch to generic old sigaction

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 29fd4480 25-Dec-2012 Al Viro <viro@zeniv.linux.org.uk>

x86: switch to generic compat rt_sigaction()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 15ce1f71 25-Dec-2012 Al Viro <viro@zeniv.linux.org.uk>

x86,um: switch to generic old sigsuspend()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 7b83d1a2 25-Dec-2012 Al Viro <viro@zeniv.linux.org.uk>

x86: switch to generic compat rt_sigqueueinfo()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# f45adb04 25-Dec-2012 Al Viro <viro@zeniv.linux.org.uk>

x86: switch to generic compat rt_sigpending()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# f03574f2 30-Jan-2013 Dave Hansen <dave@linux.vnet.ibm.com>

x86-32, mm: Rip out x86_32 NUMA remapping code

This code was an optimization for 32-bit NUMA systems.

It has probably been the cause of a number of subtle bugs over
the years, although the conditions to excite them would have
been hard to trigger. Essentially, we remap part of the kernel
linear mapping area, and then sometimes part of that area gets
freed back in to the bootmem allocator. If those pages get
used by kernel data structures (say mem_map[] or a dentry),
there's no big deal. But, if anyone ever tried to use the
linear mapping for these pages _and_ cared about their physical
address, bad things happen.

For instance, say you passed __GFP_ZERO to the page allocator
and then happened to get handed one of these pages, it zero the
remapped page, but it would make a pte to the _old_ page.
There are probably a hundred other ways that it could screw
with things.

We don't need to hang on to performance optimizations for
these old boxes any more. All my 32-bit NUMA systems are long
dead and buried, and I probably had access to more than most
people.

This code is causing real things to break today:

https://lkml.org/lkml/2013/1/9/376

I looked in to actually fixing this, but it requires surgery
to way too much brittle code, as well as stuff like
per_cpu_ptr_to_phys().

[ hpa: Cc: this for -stable, since it is a memory corruption issue.
However, an alternative is to simply mark NUMA as depends BROKEN
rather than EXPERIMENTAL in the X86_32 subclause... ]

Link: http://lkml.kernel.org/r/20130131005616.1C79F411@kernel.stglabs.ibm.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>


# da76f64e 21-Dec-2012 Fenghua Yu <fenghua.yu@intel.com>

x86/Kconfig: Make early microcode loading a configuration feature

MICROCODE_INTEL_LIB, MICROCODE_INTEL_EARLY, and MICROCODE_EARLY are three new
configurations to enable or disable the feature.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-13-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 6f16eebe 25-Jan-2013 John Stultz <john.stultz@linaro.org>

timekeeping: Switch HAS_PERSISTENT_CLOCK to ALWAYS_USE_PERSISTENT_CLOCK

Jason pointed out the HAS_PERSISTENT_CLOCK name isn't
quite accurate for the config, as some systems may have
the persistent_clock in some cases, but not always.

So change the config name to the more clear
ALWAYS_USE_PERSISTENT_CLOCK.

Signed-off-by: John Stultz <john.stultz@linaro.org>


# 83a57a4d 19-Dec-2012 David Woodhouse <dwmw2@infradead.org>

x86: Enable ARCH_USE_BUILTIN_BSWAP

With -mmovbe enabled (implicit with -march=atom), this allows the
compiler to use the movbe instruction. This doesn't have a significant
effect on code size (unlike on PowerPC), because the movbe instruction
actually takes as many bytes to encode as a simple mov and a bswap. But
for Atom in particular I believe it should give a performance win over
the mov+bswap alternative. That was kind of why movbe was invented in
the first place, after all...

I've done basic functionality testing with IPv6 and Legacy IP, but no
performance testing. The EFI firmware on my test box unfortunately no
longer starts up.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Link: http://lkml.kernel.org/r/1355966180.18919.102.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 7d029125 04-Jan-2013 Vivien Didelot <vivien.didelot@gmail.com>

x86: Add TS-5500 platform support

The Technologic Systems TS-5500 is an x86-based (AMD Elan SC520)
single board computer. This driver registers most of its devices
and exposes sysfs attributes for information such as jumpers'
state or presence of some of its options.

This driver currently registers the TS-5500 platform, its
on-board LED, 2 pin blocks (GPIO) and its analog/digital
converter. It can be extended to support other Technologic
Systems products, such as the TS-5600.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Savoir-faire Linux Inc. <kernel@savoirfairelinux.com>
Link: http://lkml.kernel.org/r/1357334294-12760-1-git-send-email-vivien.didelot@savoirfairelinux.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ed8e47fe 18-Dec-2012 Randy Dunlap <rdunlap@infradead.org>

x86/olpc: Fix olpc-xo1-sci.c build errors

Fix build errors when CONFIG_INPUT=m. This is not pretty, but
all of the OLPC kconfig options are bool instead of tristate.

arch/x86/built-in.o: In function `send_lid_state':
olpc-xo1-sci.c:(.text+0x1d323): undefined reference to `input_event'
olpc-xo1-sci.c:(.text+0x1d338): undefined reference to `input_event'
...

In the long run, fixing this driver kconfig to be tristate
instead of bool would be a very good change.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Chris Ball <cjb@laptop.org>
Cc: Jon Nettleton <jon.nettleton@gmail.com>
Cc: Daniel Drake <dsd@laptop.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2c922cd0 22-Jan-2013 Kees Cook <keescook@chromium.org>

x86/cpu/hotplug: Remove CONFIG_EXPERIMENTAL dependency

The CONFIG_EXPERIMENTAL config item has not carried much meaning
for a while now and is almost always enabled by default. As
agreed during the Linux kernel summit, remove it from any
"depends on" lines in Kconfigs.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: http://lkml.kernel.org/r/20130122210119.GA311@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 3d48aab1 18-Jan-2013 Mika Westerberg <mika.westerberg@linux.intel.com>

x86: add support for Intel Low Power Subsystem

We are starting to see traditional SoC peripherals also in the x86 world in
chips like Intel Lynxpoint. Typically we already have a Linux driver for
the peripheral but it takes advantage of the common clk framework to
control and retrieve information about the peripheral clock.

So far there hasn't been a standard way on x86 to pass information such as
clock rate from whatever the configuration system is used to the driver,
but instead different variations have emerged, like adding this information
to the platform data.

Solve this by adding a new config option X86_INTEL_LPSS. If this is
selected we enable common clk framework (and everything else) that is
needed to support the Intel LPSS drivers.

Enabling common clk framework on x86 was originally proposed by Mark Brown.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ddd70cf9 21-Jan-2013 Jun Nakajima <jnakajim@gmail.com>

goldfish: platform device for x86

Based on code by Jun Nakajima but stripped of all the old x86 mach-foo
stuff and turned into a single file for the Goldfish virtual bus layer.

The actual created platform device and bus enumeration is portable between the
ARM and x86 Goldfish emulations.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Link: http://lkml.kernel.org/r/20130121172205.19517.22535.stgit@bob.linux.org.uk
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Xiaohui Xin <xiaohui.xin@intel.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Bruce Beare <bruce.j.beare@intel.com>
[Ported to 3.7 and reorganised so that we can keep most of the code
shared properly]
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>


# e7dbfe34 28-Sep-2012 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

kprobes/x86: Move ftrace-based kprobe code into kprobes-ftrace.c

Split ftrace-based kprobes code from kprobes, and introduce
CONFIG_(HAVE_)KPROBES_ON_FTRACE Kconfig flags.
For the cleanup reason, this also moves kprobe_ftrace check
into skip_singlestep.

Link: http://lkml.kernel.org/r/20120928081520.3560.25624.stgit@ltc138.sdl.hitachi.co.jp

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 06aeaaea 28-Sep-2012 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

ftrace: Move ARCH_SUPPORTS_FTRACE_SAVE_REGS in Kconfig

Move SAVE_REGS support flag into Kconfig and rename
it to CONFIG_DYNAMIC_FTRACE_WITH_REGS. This also introduces
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which indicates
the architecture depending part of ftrace has a code
that saves full registers.
On the other hand, CONFIG_DYNAMIC_FTRACE_WITH_REGS indicates
the code is enabled.

Link: http://lkml.kernel.org/r/20120928081516.3560.72534.stgit@ltc138.sdl.hitachi.co.jp

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e90c83f7 15-Jan-2013 John Stultz <john.stultz@linaro.org>

x86: Select HAS_PERSISTENT_CLOCK on x86

Select HAS_PERSISTENT_CLOCK on x86 to simplify RTC options
and allow the compiler to remove unused code.

Signed-off-by: John Stultz <john.stultz@linaro.org>


# 6ea30386 02-Oct-2012 Kees Cook <keescook@chromium.org>

arch/x86: remove depends on CONFIG_EXPERIMENTAL

The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>


# ffee0de4 20-Dec-2012 David Woodhouse <dwmw2@infradead.org>

x86: Default to ARCH=x86 to avoid overriding CONFIG_64BIT

It is easy to waste a bunch of time when one takes a 32-bit .config
from a test machine and try to build it on a faster 64-bit system, and
its existing setting of CONFIG_64BIT=n gets *changed* to match the
build host. Similarly, if one has an existing build tree it is easy
to trash an entire build tree that way.

This is because the default setting for $ARCH when discovered from
'uname' is one of the legacy pre-x86-merge values (i386 or x86_64),
which effectively force the setting of CONFIG_64BIT to match. We should
default to ARCH=x86 instead, finally completing the merge that we
started so long ago.

This patch preserves the behaviour of the legacy ARCH settings for commands
such as:

make ARCH=x86_64 randconfig
make ARCH=i386 randconfig

... since making the value of CONFIG_64BIT actually random in that situation
is not desirable.

In time, perhaps we can retire this legacy use of the old ARCH= values.
We already have a way to override values for *any* config option, using
$KCONFIG_ALLCONFIG, so it could be argued that we don't necessarily need
to keep ARCH={i386,x86_64} around as a special case just for overriding
CONFIG_64BIT.

We'd probably at least want to add a way to override config options from
the command line ('make CONFIG_FOO=y oldconfig') before we talk about doing
that though.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Link: http://lkml.kernel.org/r/1356040315.3198.51.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 6bf9adfc 14-Dec-2012 Al Viro <viro@zeniv.linux.org.uk>

introduce generic sys_sigaltstack(), switch x86 and um to it

Conditional on CONFIG_GENERIC_SIGALTSTACK; architectures that do not
select it are completely unaffected

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# ae903caa 13-Dec-2012 Al Viro <viro@zeniv.linux.org.uk>

Bury the conditionals from kernel_thread/kernel_execve series

All architectures have
CONFIG_GENERIC_KERNEL_THREAD
CONFIG_GENERIC_KERNEL_EXECVE
__ARCH_WANT_SYS_EXECVE
None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers
of kernel_execve() (which is a trivial wrapper for do_execve() now) left.
Kill the conditionals and make both callers use do_execve().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# cbee9f88 25-Oct-2012 Peter Zijlstra <a.p.zijlstra@chello.nl>

mm: numa: Add fault driven placement and migration

NOTE: This patch is based on "sched, numa, mm: Add fault driven
placement and migration policy" but as it throws away all the policy
to just leave a basic foundation I had to drop the signed-offs-by.

This patch creates a bare-bones method for setting PTEs pte_numa in the
context of the scheduler that when faulted later will be faulted onto the
node the CPU is running on. In itself this does nothing useful but any
placement policy will fundamentally depend on receiving hints on placement
from fault context and doing something intelligent about it.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>


# f9726bfd 07-Dec-2012 Daniel J Blueman <daniel@numascale-asia.com>

x86/PCI: Add NumaChip remote PCI support

Add NumaChip-specific PCI access mechanism via MMCONFIG cycles, but
preventing access to AMD Northbridges which shouldn't respond.

Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 91d1aa43 27-Nov-2012 Frederic Weisbecker <fweisbec@gmail.com>

context_tracking: New context tracking susbsystem

Create a new subsystem that probes on kernel boundaries
to keep track of the transitions between level contexts
with two basic initial contexts: user or kernel.

This is an abstraction of some RCU code that use such tracking
to implement its userspace extended quiescent state.

We need to pull this up from RCU into this new level of indirection
because this tracking is also going to be used to implement an "on
demand" generic virtual cputime accounting. A necessary step to
shutdown the tick while still accounting the cputime.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
[ paulmck: fix whitespace error and email address. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>


# 7ac468b1 28-Nov-2012 H. Peter Anvin <hpa@linux.intel.com>

x86, 386 removal: Remove CONFIG_XADD

All 486+ CPUs support XADD, so remove the fallback 386 support
code.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1354132230-21854-4-git-send-email-hpa@linux.intel.com


# eb068e78 28-Nov-2012 H. Peter Anvin <hpa@linux.intel.com>

x86, 386 removal: Remove CONFIG_M386 from Kconfig

Remove the CONFIG_M386 symbol from Kconfig so that it cannot be
selected.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1354132230-21854-2-git-send-email-hpa@linux.intel.com


# 1d4b4b29 22-Oct-2012 Al Viro <viro@zeniv.linux.org.uk>

x86, um: switch to generic fork/vfork/clone

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 6147a9d8 19-Oct-2012 Frederic Weisbecker <fweisbec@gmail.com>

irq_work: Remove CONFIG_HAVE_IRQ_WORK

irq work can run on any arch even without IPI
support because of the hook on update_process_times().

So lets remove HAVE_IRQ_WORK because it doesn't reflect
any backend requirement.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>


# a71c8bc5 13-Nov-2012 Fenghua Yu <fenghua.yu@intel.com>

x86, topology: Debug CPU0 hotplug

CONFIG_DEBUG_HOTPLUG_CPU0 is for debugging the CPU0 hotplug feature. The switch
offlines CPU0 as soon as possible and boots userspace up with CPU0 offlined.
User can online CPU0 back after boot time. The default value of the switch is
off.

To debug CPU0 hotplug, you need to enable CPU0 offline/online feature by either
turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during compilation or giving
cpu0_hotplug kernel parameter at boot.

It's safe and early place to take down CPU0 after all hotplug notifiers
are installed and SMP is booted.

Please note that some applications or drivers, e.g. some versions of udevd,
during boot time may put CPU0 online again in this CPU0 hotplug debug mode.

In this debug mode, setup_local_APIC() may report a warning on max_loops<=0
when CPU0 is onlined back after boot time. This is because pending interrupt in
IRR can not move to ISR. The warning is not CPU0 specfic and it can happen on
other CPUs as well. It is harmless except the first CPU0 online takes a bit
longer time. And so this debug mode is useful to expose this issue. I'll send
a seperate patch to fix this generic warning issue.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-15-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 80aa1dff 13-Nov-2012 Fenghua Yu <fenghua.yu@intel.com>

x86, Kconfig: Add config switch for CPU0 hotplug

New config switch CONFIG_BOOTPARAM_HOTPLUG_CPU0 sets default state of whether
the CPU0 hotplug is on or off.

If the switch is off, CPU0 is not hotpluggable by default. But the CPU0 hotplug
feature can still be turned on by kernel parameter cpu0_hotplug at boot.

If the switch is on, CPU0 is always hotpluggable.

The default value of the switch is off.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 6e87f9b7 25-Oct-2012 Bin Gao <bin.gao@linux.intel.com>

arch/x86/Kconfig: Allow turning off CONFIG_X86_MPPARSE when either ACPI or SFI is present

MPS tables are not needed for systems that have proper ACPI
support. This is also true for systems that have SFI in place.

So this patch allows the configuration (turning off) of
CONFIG_X86_MPPARSE when either ACPI or SFI is present.

Signed-off-by: Bin Gao <bin.gao@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Bin Gao <bin.gao@linux.intel.com>
Link: http://lkml.kernel.org/r/20121025163544.GB38477@bingao-desk1.fm.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 22e2430d 10-Oct-2012 Al Viro <viro@zeniv.linux.org.uk>

x86, um: convert to saner kernel_execve() semantics

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 15626062 08-Oct-2012 Gerald Schaefer <gerald.schaefer@linux.ibm.com>

thp, x86: introduce HAVE_ARCH_TRANSPARENT_HUGEPAGE

Cleanup patch in preparation for transparent hugepage support on s390.
Adding new architectures to the TRANSPARENT_HUGEPAGE config option can
make the "depends" line rather ugly, like "depends on (X86 || (S390 &&
64BIT)) && MMU".

This patch adds a HAVE_ARCH_TRANSPARENT_HUGEPAGE instead. x86 already has
MMU "def_bool y", so the MMU check is superfluous there and
HAVE_ARCH_TRANSPARENT_HUGEPAGE can be selected in arch/x86/Kconfig.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7ac57a89 08-Oct-2012 Catalin Marinas <catalin.marinas@arm.com>

Kconfig: clean up the "#if defined(arch)" list for exception-trace sysctl entry

Introduce SYSCTL_EXCEPTION_TRACE config option and selec it in the
architectures requiring support for the "exception-trace" debug_table
entry in kernel/sysctl.c.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b69ec42b 08-Oct-2012 Catalin Marinas <catalin.marinas@arm.com>

Kconfig: clean up the long arch list for the DEBUG_KMEMLEAK config option

Introduce HAVE_DEBUG_KMEMLEAK config option and select it in corresponding
architecture Kconfig files. DEBUG_KMEMLEAK now only depends on
HAVE_DEBUG_KMEMLEAK.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# af1839eb 08-Oct-2012 Catalin Marinas <catalin.marinas@arm.com>

Kconfig: clean up the long arch list for the UID16 config option

Introduce HAVE_UID16 config option and select it in corresponding
architecture Kconfig files. UID16 now only depends on HAVE_UID16.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7076aada 10-Sep-2012 Al Viro <viro@zeniv.linux.org.uk>

x86: split ret_from_fork

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 786d35d4 27-Sep-2012 David Howells <dhowells@redhat.com>

Make most arch asm/module.h files use asm-generic/module.h

Use the mapping of Elf_[SPE]hdr, Elf_Addr, Elf_Sym, Elf_Dyn, Elf_Rel/Rela,
ELF_R_TYPE() and ELF_R_SYM() to either the 32-bit version or the 64-bit version
into asm-generic/module.h for all arches bar MIPS.

Also, use the generic definition mod_arch_specific where possible.

To this end, I've defined three new config bools:

(*) HAVE_MOD_ARCH_SPECIFIC

Arches define this if they don't want to use the empty generic
mod_arch_specific struct.

(*) MODULES_USE_ELF_RELA

Arches define this if their modules can contain RELA records. This causes
the Elf_Rela mapping to be emitted and allows apply_relocate_add() to be
defined by the arch rather than have the core emit an error message.

(*) MODULES_USE_ELF_REL

Arches define this if their modules can contain REL records. This causes
the Elf_Rel mapping to be emitted and allows apply_relocate() to be
defined by the arch rather than have the core emit an error message.

Note that it is possible to allow both REL and RELA records: m68k and mips are
two arches that do this.

With this, some arch asm/module.h files can be deleted entirely and replaced
with a generic-y marker in the arch Kbuild file.

Additionally, I have removed the bits from m32r and score that handle the
unsupported type of relocation record as that's now handled centrally.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# edf55fda 11-Jul-2012 Frederic Weisbecker <fweisbec@gmail.com>

x86: Exit RCU extended QS on notify resume

do_notify_resume() may be called on irq or exception
exit. But at that time the exception has already called
rcu_user_enter() and the irq has already called rcu_irq_exit().

Since it can use RCU read side critical section, we must call
rcu_user_exit() before doing anything there. Then we must call
back rcu_user_enter() after this function because we know we are
going to userspace from there.

This complete support for userspace RCU extended quiescent state
in x86-64.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>


# fdf9c356 09-Sep-2012 Frederic Weisbecker <fweisbec@gmail.com>

cputime: Make finegrained irqtime accounting generally available

There is no known reason for this option to be unavailable on other
archs than x86. They just need to call enable_sched_clock_irqtime()
if they have a sufficiently finegrained clock to make it working.

Move it to the general option and let the user choose between
it and pure tick based or virtual cputime accounting.

Note that virtual cputime accounting already performs a finegrained
irqtime accounting. CONFIG_IRQ_TIME_ACCOUNTING is a kind of middle ground
between tick and virtual based accounting. So CONFIG_IRQ_TIME_ACCOUNTING
and CONFIG_VIRT_CPU_ACCOUNTING are mutually exclusive choices.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>


# 650ea024 04-Sep-2012 John Stultz <john.stultz@linaro.org>

time: Convert x86_64 to using new update_vsyscall

Switch x86_64 to using sub-ns precise vsyscall

Cc: Tony Luck <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>


# 70639421 04-Sep-2012 John Stultz <john.stultz@linaro.org>

time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLD

To help migrate archtectures over to the new update_vsyscall method,
redfine CONFIG_GENERIC_TIME_VSYSCALL as CONFIG_GENERIC_TIME_VSYSCALL_OLD

Cc: Tony Luck <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>


# 51ae4a2d 21-Sep-2012 H. Peter Anvin <hpa@linux.intel.com>

x86, smap: Add a header file with macros for STAC/CLAC

The STAC/CLAC instructions are only available with SMAP, but on the
other hand they aren't needed if SMAP is not available, or before we
start to run userspace, so construct them as alternatives which start
out as noops and are enabled by the alternatives mechanism.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1348256595-29119-7-git-send-email-hpa@linux.intel.com


# e57dbaf7 13-Sep-2011 Borislav Petkov <borislav.petkov@amd.com>

x86, mce: Enable MCA support by default

MCA is the basic support for hardware error logging and reporting, and
it is majorly unwise to run without it so enable machine check software
support by default on x86.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Tony Luck <tony.luck@intel.com>


# 3120e25e 09-Sep-2012 Jan Beulich <JBeulich@suse.com>

x86/Kconfig: Clean up Kconfig defaults

The main goal here is to have the resulting .config no carry any
options that aren't enabled and can't be (i.e such where the
default is "no" and can't be changed), so that if any such
option later gets a user visible prompt, the user will actually
be prompted on a "make ...oldconfig" rather than keeping the
previously invisible option disabled.

There's a little bit of other trivial cleanup mixed in here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/504DEE19020000780009A285@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 4454d327 02-Sep-2012 Joe Millenbach <jmillenbach@gmail.com>

x86/kconfig: Remove outdated reference to Intel CPUs in CONFIG_SWIOTLB

Deleted the no longer valid example of which x86 CPUs lack a
hardware IOMMU, and moved the "If unsure..." statement to a new
line to follow the style of surrounding options.

Signed-off-by: Joe Millenbach <jmillenbach@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: team-fjord@googlegroups.com
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1346632700-29113-1-git-send-email-jmillenbach@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d57c5d51 09-Feb-2011 Steven Rostedt <srostedt@redhat.com>

ftrace/x86: Add support for -mfentry to x86_64

If the kernel is compiled with gcc 4.6.0 which supports -mfentry,
then use that instead of mcount.

With mcount, frame pointers are forced with the -pg option and we
get something like:

<can_vma_merge_before>:
55 push %rbp
48 89 e5 mov %rsp,%rbp
53 push %rbx
41 51 push %r9
e8 fe 6a 39 00 callq ffffffff81483d00 <mcount>
31 c0 xor %eax,%eax
48 89 fb mov %rdi,%rbx
48 89 d7 mov %rdx,%rdi
48 33 73 30 xor 0x30(%rbx),%rsi
48 f7 c6 ff ff ff f7 test $0xfffffffff7ffffff,%rsi

With -mfentry, frame pointers are no longer forced and the call looks
like this:

<can_vma_merge_before>:
e8 33 af 37 00 callq ffffffff81461b40 <__fentry__>
53 push %rbx
48 89 fb mov %rdi,%rbx
31 c0 xor %eax,%eax
48 89 d7 mov %rdx,%rdi
41 51 push %r9
48 33 73 30 xor 0x30(%rbx),%rsi
48 f7 c6 ff ff ff f7 test $0xfffffffff7ffffff,%rsi

This adds the ftrace hook at the beginning of the function before a
frame is set up, and allows the function callbacks to be able to access
parameters. As kprobes now can use function tracing (at least on x86)
this speeds up the kprobe hooks that are at the beginning of the
function.

Link: http://lkml.kernel.org/r/20120807194100.130477900@goodmis.org

Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 90993cdd 16-Aug-2012 Marcelo Tosatti <mtosatti@redhat.com>

x86: KVM guest: merge CONFIG_KVM_CLOCK into CONFIG_KVM_GUEST

The distinction between CONFIG_KVM_CLOCK and CONFIG_KVM_GUEST is
not so clear anymore, as demonstrated by recent bugs caused by poor
handling of on/off combinations of these options.

Merge CONFIG_KVM_CLOCK into CONFIG_KVM_GUEST.

Reported-By: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>


# e43f6e67 01-Aug-2012 Borislav Petkov <borislav.petkov@amd.com>

x86, microcode: Straighten out Kconfig text

Update and clarify Kconfig help text along with menu names.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1344361461-10076-6-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# c5ebcedb 07-Aug-2012 Jiri Olsa <jolsa@redhat.com>

perf: Add ability to attach user stack dump to sample

Introducing PERF_SAMPLE_STACK_USER sample type bit to trigger the dump
of the user level stack on sample. The size of the dump is specified by
sample_stack_user value.

Being able to dump parts of the user stack, starting from the stack
pointer, will be useful to make a post mortem dwarf CFI based stack
unwinding.

Added HAVE_PERF_USER_STACK_DUMP config option to determine if the
architecture provides user stack dump on perf event samples. This needs
access to the user stack pointer which is not unified across
architectures. Enabling this for x86 architecture.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-6-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# c5e63197 07-Aug-2012 Jiri Olsa <jolsa@redhat.com>

perf: Unified API to record selective sets of arch registers

This brings a new API to help the selective dump of registers on event
sampling, and its implementation for x86 arch.

Added HAVE_PERF_REGS config option to determine if the architecture
provides perf registers ABI.

The information about desired registers will be passed in u64 mask.
It's up to the architecture to map the registers into the mask bits.

For the x86 arch implementation, both 32 and 64 bit registers bits are
defined within single enum to ensure 64 bit system can provide register
dump for compat task if needed in the future.

Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com>
[ Added missing linux/errno.h include ]
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-2-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# c1d7e01d 30-Jul-2012 Will Deacon <will@kernel.org>

ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION

Rather than #define the options manually in the architecture code, add
Kconfig options for them and select them there instead. This also allows
us to select the compat IPC version parsing automatically for platforms
using the old compat IPC interface.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7463449b 30-Jul-2012 Catalin Marinas <catalin.marinas@arm.com>

atomic64_test: simplify the #ifdef for atomic64_dec_if_positive() test

Introduce CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE and use this instead
of the multitude of #if defined() checks in atomic64_test.c

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2a8ac745 06-Jul-2012 Jean Delvare <jdelvare@suse.de>

x86: CONFIG_CC_STACKPROTECTOR=y is no longer experimental

This feature has been around for over 5 years now, and has no
CONFIG_EXPERIMENTAL dependency anymore, so remove the '(EXPERIMENTAL)'
tag from the help text as well.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Link: http://lkml.kernel.org/r/1341583705.4655.18.camel@amber.site
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0c759662 15-Mar-2012 Matt Fleming <matt.fleming@intel.com>

x86, efi: Add EFI boot stub documentation

Since we can't expect every user to read the EFI boot stub code it
seems prudent to have a couple of paragraphs explaining what it is and
how it works.

The "initrd=" option in particular is tricky because it only
understands absolute EFI-style paths (backslashes as directory
separators), and until now this hasn't been documented anywhere. This
has tripped up a couple of users.

Cc: Matthew Garrett <mjg@redhat.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1331907517-3985-4-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 5723aa99 26-May-2012 Linus Torvalds <torvalds@linux-foundation.org>

x86: use the new generic strnlen_user() function

This throws away the old x86-specific functions in favor of the generic
optimized version.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4ae73f2d 26-May-2012 Linus Torvalds <torvalds@linux-foundation.org>

x86: use generic strncpy_from_user routine

The generic strncpy_from_user() is not really optimal, since it is
designed to work on both little-endian and big-endian. And on
little-endian you can simplify much of the logic to find the first zero
byte, since little-endian arithmetic doesn't have to worry about the
carry bit propagating into earlier bytes (only later bytes, which we
don't care about).

But I have patches to make the generic routines use the architecture-
specific <asm/word-at-a-time.h> infrastructure, so that we can regain
the little-endian optimizations. But before we do that, switch over to
the generic routines to make the patches each do just one well-defined
thing.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 764e0da1 21-May-2012 Thomas Gleixner <tglx@linutronix.de>

timers: Fixup the Kconfig consolidation fallout

Sigh, I missed to check which architecture Kconfig files actually
include the core Kconfig file. There are a few which did not. So we
broke them.

Instead of adding the includes to those, we are better off to move the
include to init/Kconfig like we did already with irqs and others.

This does not change anything for the architectures using the old
style periodic timer mode. It just solves the build wreckage there.

For those architectures which use the clock events infrastructure it
moves the include of the core Kconfig file to "General setup" which is
a way more logical place than having it at random locations specified
by the architecture specific Kconfigs.

Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@glx-um.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# e47b65b0 21-May-2012 Sam Ravnborg <sam@ravnborg.org>

net: drop NET dependency from HAVE_BPF_JIT

There is no point having the NET dependency on the select target, as it
forces all users to depend on NET to tell they support BPF_JIT. Move
the config option to the bottom of the file - this could be a nice place
also for future "selectable" config symbols.

Fix up all users to drop the dependency on NET now that it is not
required to supress warnings for non-NET builds.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0a2b9a6e 29-Dec-2011 Marek Szyprowski <m.szyprowski@samsung.com>

X86: integrate CMA with DMA-mapping subsystem

This patch adds support for CMA to dma-mapping subsystem for x86
architecture that uses common pci-dma/pci-nommu implementation. This
allows to test CMA on KVM/QEMU and a lot of common x86 boxes.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>


# bdebaf80 18-May-2012 Thomas Gleixner <tglx@linutronix.de>

x86: Use generic time config

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@glx-um.de>
Link: http://lkml.kernel.org/r/20120518163104.630579708@glx-um.de
Cc: x86@kernel.org


# bb8187d3 17-May-2012 Paul Gortmaker <paul.gortmaker@windriver.com>

MCA: delete all remaining traces of microchannel bus support.

Hardware with MCA bus is limited to 386 and 486 class machines
that are now 20+ years old and typically with less than 32MB
of memory. A quick search on the internet, and you see that
even the MCA hobbyist/enthusiast community has lost interest
in the early 2000 era and never really even moved ahead from
the 2.4 kernels to the 2.6 series.

This deletes anything remaining related to CONFIG_MCA from core
kernel code and from the x86 architecture. There is no point in
carrying this any further into the future.

One complication to watch for is inadvertently scooping up
stuff relating to machine check, since there is overlap in
the TLA name space (e.g. arch/x86/boot/mca.c).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: x86@kernel.org
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# ead91d4b 16-Apr-2012 Shai Fultheim <shai@scalemp.com>

x86/vsmp: Fix number of CPUs when vsmp is disabled

In case CONFIG_X86_VSMP is not set, limit the number of CPUs to
the number of CPUs of the first board.

Also make CONFIG_X86_VSMP depend on CONFIG_SMP, as there's
little point in having a vsmp machine with a single CPU.

Signed-off-by: Shai Fultheim <shai@scalemp.com>
[ido@wizery.com: rebased, fixed minor coding-style issues]
Signed-off-by: Ido Yariv <ido@wizery.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 85f7f656 07-May-2012 Thomas Gleixner <tglx@linutronix.de>

x86: Use kick_all_cpus_sync()

Use kick_all_cpus_sync() and remove cpu_idle_wait().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120507175652.190382227@linutronix.de
Cc: x86@kernel.org


# a6359d1e 03-May-2012 Thomas Gleixner <tglx@linutronix.de>

init_task: Replace CONFIG_HAVE_GENERIC_INIT_TASK

Now that all archs except ia64 are converted, replace the config and
let the ia64 select CONFIG_ARCH_INIT_TASK

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120503085035.867948914@linutronix.de


# 45046892 03-May-2012 Thomas Gleixner <tglx@linutronix.de>

x86: Use generic init_task

Same code. Use the generic version. The special Makefile treatment is
pointless anyway as init_task.o contains only data which is handled by
the linker script. So no point on being treated like head text.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120503085035.739963562@linutronix.de
Cc: x86@kernel.org


# e419b4cc 03-May-2012 Linus Torvalds <torvalds@linux-foundation.org>

vfs: make word-at-a-time accesses handle a non-existing page

It turns out that there are more cases than CONFIG_DEBUG_PAGEALLOC that
can have holes in the kernel address space: it seems to happen easily
with Xen, and it looks like the AMD gart64 code will also punch holes
dynamically.

Actually hitting that case is still very unlikely, so just do the
access, and take an exception and fix it up for the very unlikely case
of it being a page-crosser with no next page.

And hey, this abstraction might even help other architectures that have
other issues with unaligned word accesses than the possible missing next
page. IOW, this could do the byte order magic too.

Peter Anvin fixed a thinko in the shifting for the exception case.

Reported-and-tested-by: Jana Saout <jana@saout.de>
Cc: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4a6d70c9 24-Apr-2012 Steven Rostedt <srostedt@redhat.com>

ftrace/x86: Remove the complex ftrace NMI handling code

As ftrace function tracing would require modifying code that could
be executed in NMI context, which is not stopped with stop_machine(),
ftrace had to do a complex algorithm with various stages of setup
and memory barriers to make it work.

With the new breakpoint method, this is no longer required. The changes
to the code can be done without any problem in NMI context, as well as
without stop machine altogether. Remove the complex code as it is
no longer needed.

Also, a lot of the notrace annotations could be removed from the
NMI code as it is now safe to trace them. With the exception of
do_nmi itself, which does some special work to handle running in
the debug stack. The breakpoint method can cause NMIs to double
nest the debug stack if it's not setup properly, and that is done
in do_nmi(), thus that function must not be traced.

(Note the arch sh may want to do the same)

Cc: Paul Mundt <lethal@linux-sh.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7eb43a6d 20-Apr-2012 Thomas Gleixner <tglx@linutronix.de>

x86: Use generic idle thread allocation

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20120420124557.246929343@linutronix.de


# 8b5ad472 24-Apr-2012 David Daney <ddaney@caviumnetworks.com>

Revert "x86, extable: Disable presorted exception table for now"

sortextable now works with relative entries, re-enable it.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Link: http://lkml.kernel.org/r/1335291795-26693-3-git-send-email-ddaney.cavm@gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# fa574a48 20-Apr-2012 H. Peter Anvin <hpa@zytor.com>

x86, extable: Disable presorted exception table for now

Disable presorting the exception table in preparation for changing the
format.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: David Daney <david.daney@cavium.com>
Link: http://lkml.kernel.org/r/CA%2B55aFyijf43qSu3N9nWHEBwaGbb7T2Oq9A=9EyR=Jtyqfq_cQ@mail.gmail.com


# d405c601 19-Apr-2012 David Daney <david.daney@cavium.com>

x86: Select BUILDTIME_EXTABLE_SORT

We can sort the exeception table at build time for x86, so let's do
it.

Signed-off-by: David Daney <david.daney@cavium.com>
Link: http://lkml.kernel.org/r/1334872799-14589-6-git-send-email-ddaney.cavm@gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 30261691 06-Apr-2012 Sam Ravnborg <sam@ravnborg.org>

x86: Drop obsolete ARCH_BOOTMEM support

x86 unconditionally uses NO_BOOTMEM so there is no use
of the HAVE_ARCH_BOOTMEM support as mm/bootmem.c is the
only file referencing this symbol.

bootmem_arch_preferred_node() is the function referred
in the mm/bootmem.c code and can thuis be dropped too.

x86 was the sole user of HAVE_ARCH_BOOTMEM - so there is
an opportunity to clean up a little in mm/bootmem.c too
if we do not expect other users to emerge.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20120406124735.GA6920@merkur.ravnborg.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# c6cfbeb4 12-Apr-2012 Will Drewry <wad@chromium.org>

x86: Enable HAVE_ARCH_SECCOMP_FILTER

Enable support for seccomp filter on x86:
- syscall_get_arch()
- syscall_get_arguments()
- syscall_rollback()
- syscall_set_return_value()
- SIGSYS siginfo_t support
- secure_computing is called from a ptrace_event()-safe context
- secure_computing return value is checked (see below).

SECCOMP_RET_TRACE and SECCOMP_RET_TRAP may result in seccomp needing to
skip a system call without killing the process. This is done by
returning a non-zero (-1) value from secure_computing. This change
makes x86 respect that return value.

To ensure that minimal kernel code is exposed, a non-zero return value
results in an immediate return to user space (with an invalid syscall
number).

Signed-off-by: Will Drewry <wad@chromium.org>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>

v18: rebase and tweaked change description, acked-by
v17: added reviewed by and rebased
v..: all rebases since original introduction.
Signed-off-by: James Morris <james.l.morris@oracle.com>


# 83125a3a 04-Apr-2012 Alessandro Rubini <rubini@gnudd.com>

x86, platform: Initial support for sta2x11 I/O hub

The "ConneXt" sta2x11 I/O Hub is a bridge from PCIe to AMBA, and is
used as main chipset in some Atom boards. The set of peripherals it
exports live in an AMBA bus internal to the chip, so a custom
remapping of addresses is needed. This is implemented by fixup calls
for the PCI deivices, based on CONFIG_X86_DEV_DMA_OPS and
CONFIG_X86_DMA_REMAP .

Signed-off-by: Alessandro Rubini <rubini@gnudd.com>
Link: http://lkml.kernel.org/r/ddca670ca8180e52d49b3fe642742ddd23ab2cb2.1333560789.git.rubini@gnudd.com
Acked-by: Giancarlo Asnaghi <giancarlo.asnaghi@st.com>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# f7219a53 04-Apr-2012 Alessandro Rubini <rubini@gnudd.com>

x86: Introduce CONFIG_X86_DMA_REMAP

The default functions phys_to_dma, dma_to_phys implement identity
mapping as fast inline functions. Some systems, however, may need a
custom function to implement its own mapping between CPU addresses and
device addresses. This new configuration option allows the functions
to be external when needed (such as for the ConneXt device)

Signed-off-by: Alessandro Rubini <rubini@gnudd.com>
Link: http://lkml.kernel.org/r/6e4329b772df675f1c442f68e59e844e4dd8c965.1333560789.git.rubini@gnudd.com
Acked-by: Giancarlo Asnaghi <giancarlo.asnaghi@st.com>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 4692d77f 04-Apr-2012 Alessandro Rubini <rubini@gnudd.com>

x86-32: Introduce CONFIG_X86_DEV_DMA_OPS

32-bit x86 systems may need their own DMA operations, so add
a new config option, which is turned on for 64-bit systems. This
patch has no functional effect but it paves the way for supporting
the STA2x11 I/O Hub and possibly other chips.

Signed-off-by: Alessandro Rubini <rubini@gnudd.com>
Link: http://lkml.kernel.org/r/f79fcc1a2e17ef942e1b798b92aac43a80202532.1333560789.git.rubini@gnudd.com
Acked-by: Giancarlo Asnaghi <giancarlo.asnaghi@st.com>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 3197059a 14-Jan-2012 Philip A. Prindeville <philipp@redfish-solutions.com>

geos: Platform driver for Geos and Geos2 single-board computers.

Trivial platform driver for Traverse Technologies Geos and Geos2
single-board computers. Uses SMBIOS to identify platform.
Based on progressive revisions of the leds-net5501 driver that
was rewritten by Ed Wildgoose as a platform driver.

Supports GPIO-based LEDs (3) and 1 polled button which is
typically used for a soft reset.

Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
Reviewed-by: Ed Wildgoose <ed@wildgooses.com>
Acked-by: Andres Salomon <dilinger@queued.net>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Garrett <mjg@redhat.com>


# 48b25c43 15-Mar-2012 Chris Metcalf <cmetcalf@tilera.com>

[PATCH v3] ipc: provide generic compat versions of IPC syscalls

When using the "compat" APIs, architectures will generally want to
be able to make direct syscalls to msgsnd(), shmctl(), etc., and
in the kernel we would want them to be handled directly by
compat_sys_xxx() functions, as is true for other compat syscalls.

However, for historical reasons, several of the existing compat IPC
syscalls do not do this. semctl() expects a pointer to the fourth
argument, instead of the fourth argument itself. msgsnd(), msgrcv()
and shmat() expect arguments in different order.

This change adds an ARCH_WANT_OLD_COMPAT_IPC config option that can be
set to preserve this behavior for ports that use it (x86, sparc, powerpc,
s390, and mips). No actual semantics are changed for those architectures,
and there is only a minimal amount of code refactoring in ipc/compat.c.

Newer architectures like tile (and perhaps future architectures such
as arm64 and unicore64) should not select this option, and thus can
avoid having any IPC-specific code at all in their architecture-specific
compat layer. In the same vein, if this option is not selected, IPC_64
mode is assumed, since that's what the <asm-generic> headers expect.

The workaround code in "tile" for msgsnd() and msgrcv() is removed
with this change; it also fixes the bug that shmat() and semctl() were
not being properly handled.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>


# bfcfaa77 06-Mar-2012 Linus Torvalds <torvalds@linux-foundation.org>

vfs: use 'unsigned long' accesses for dcache name comparison and hashing

Ok, this is hacky, and only works on little-endian machines with goo
unaligned handling. And even then only with CONFIG_DEBUG_PAGEALLOC
disabled, since it can access up to 7 bytes after the pathname.

But it runs like a bat out of hell.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# da4e3302 05-Mar-2012 Philip Prindeville <philipp@redfish-solutions.com>

x86/geode/net5501: Add platform driver for Soekris Engineering net5501

Add platform driver for the Soekris Engineering net5501 single-board
computer. Probes well-known locations in ROM for BIOS signature
to confirm correct platform. Registers 1 LED and 1 GPIO-based
button (typically used for soft reset).

Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Matthew Garrett <mjg@redhat.com>
[ Removed Kconfig and Makefile detritus from drivers/leds/]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-jv5uf34996juqh5syes8mn4h@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0bf62763 27-Feb-2012 H. Peter Anvin <hpa@linux.intel.com>

x32: Warn and disable rather than error if binutils too old

If X32 is enabled in .config, but the binutils can't build it, issue a
warning and disable the feature rather than erroring out.

In order to support this, have CONFIG_X86_X32 be the option set in
Kconfig, and CONFIG_X86_X32_ABI be the option set by the Makefile when
it is enabled and binutils has been found to be functional.

Requested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: H. J. Lu <hjl.tools@gmail.com>
Link: http://lkml.kernel.org/r/1329696488-16970-1-git-send-email-hpa@zytor.com


# b4e51854 16-Dec-2011 Grant Likely <grant.likely@secretlab.ca>

irq_domain/x86: Convert x86 (embedded) to use common irq_domain

This patch removes the x86-specific definition of irq_domain and replaces
it with the common implementation.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Thomas Gleixner <tglx@linutronix.de>


# 5fd92e65 19-Feb-2012 H. J. Lu <hjl.tools@gmail.com>

x32: Allow x32 to be configured

At this point, one should be able to build an x32 kernel.

Note that for now we depend on CONFIG_IA32_EMULATION. Long term, x32
and IA32 should be detangled.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 2b144498 09-Feb-2012 Srikar Dronamraju <srikar@linux.vnet.ibm.com>

uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints

Add uprobes support to the core kernel, with x86 support.

This commit adds the kernel facilities, the actual uprobes
user-space ABI and perf probe support comes in later commits.

General design:

Uprobes are maintained in an rb-tree indexed by inode and offset
(the offset here is from the start of the mapping). For a unique
(inode, offset) tuple, there can be at most one uprobe in the
rb-tree.

Since the (inode, offset) tuple identifies a unique uprobe, more
than one user may be interested in the same uprobe. This provides
the ability to connect multiple 'consumers' to the same uprobe.

Each consumer defines a handler and a filter (optional). The
'handler' is run every time the uprobe is hit, if it matches the
'filter' criteria.

The first consumer of a uprobe causes the breakpoint to be
inserted at the specified address and subsequent consumers are
appended to this list. On subsequent probes, the consumer gets
appended to the existing list of consumers. The breakpoint is
removed when the last consumer unregisters. For all other
unregisterations, the consumer is removed from the list of
consumers.

Given a inode, we get a list of the mms that have mapped the
inode. Do the actual registration if mm maps the page where a
probe needs to be inserted/removed.

We use a temporary list to walk through the vmas that map the
inode.

- The number of maps that map the inode, is not known before we
walk the rmap and keeps changing.
- extending vm_area_struct wasn't recommended, it's a
size-critical data structure.
- There can be more than one maps of the inode in the same mm.

We add callbacks to the mmap methods to keep an eye on text vmas
that are of interest to uprobes. When a vma of interest is mapped,
we insert the breakpoint at the right address.

Uprobe works by replacing the instruction at the address defined
by (inode, offset) with the arch specific breakpoint
instruction. We save a copy of the original instruction at the
uprobed address.

This is needed for:

a. executing the instruction out-of-line (xol).
b. instruction analysis for any subsequent fixups.
c. restoring the instruction back when the uprobe is unregistered.

We insert or delete a breakpoint instruction, and this
breakpoint instruction is assumed to be the smallest instruction
available on the platform. For fixed size instruction platforms
this is trivially true, for variable size instruction platforms
the breakpoint instruction is typically the smallest (often a
single byte).

Writing the instruction is done by COWing the page and changing
the instruction during the copy, this even though most platforms
allow atomic writes of the breakpoint instruction. This also
mirrors the behaviour of a ptrace() memory write to a PRIVATE
file map.

The core worker is derived from KSM's replace_page() logic.

In essence, similar to KSM:

a. allocate a new page and copy over contents of the page that
has the uprobed vaddr
b. modify the copy and insert the breakpoint at the required
address
c. switch the original page with the copy containing the
breakpoint
d. flush page tables.

replace_page() is being replicated here because of some minor
changes in the type of pages and also because Hugh Dickins had
plans to improve replace_page() for KSM specific work.

Instruction analysis on x86 is based on instruction decoder and
determines if an instruction can be probed and determines the
necessary fixups after singlestep. Instruction analysis is done
at probe insertion time so that we avoid having to repeat the
same analysis every time a probe is hit.

A lot of code here is due to the improvement/suggestions/inputs
from Peter Zijlstra.

Changelog:

(v10):
- Add code to clear REX.B prefix as suggested by Denys Vlasenko
and Masami Hiramatsu.

(v9):
- Use insn_offset_modrm as suggested by Masami Hiramatsu.

(v7):

Handle comments from Peter Zijlstra:

- Dont take reference to inode. (expect inode to uprobe_register to be sane).
- Use PTR_ERR to set the return value.
- No need to take reference to inode.
- use PTR_ERR to return error value.
- register and uprobe_unregister share code.

(v5):

- Modified del_consumer as per comments from Peter.
- Drop reference to inode before dropping reference to uprobe.
- Use i_size_read(inode) instead of inode->i_size.
- Ensure uprobe->consumers is NULL, before __uprobe_unregister() is called.
- Includes errno.h as recommended by Stephen Rothwell to fix a build issue
on sparc defconfig
- Remove restrictions while unregistering.
- Earlier code leaked inode references under some conditions while
registering/unregistering.
- Continue the vma-rmap walk even if the intermediate vma doesnt
meet the requirements.
- Validate the vma found by find_vma before inserting/removing the
breakpoint
- Call del_consumer under mutex_lock.
- Use hash locks.
- Handle mremap.
- Introduce find_least_offset_node() instead of close match logic in
find_uprobe
- Uprobes no more depends on MM_OWNER; No reference to task_structs
while inserting/removing a probe.
- Uses read_mapping_page instead of grab_cache_page so that the pages
have valid content.
- pass NULL to get_user_pages for the task parameter.
- call SetPageUptodate on the new page allocated in write_opcode.
- fix leaking a reference to the new page under certain conditions.
- Include Instruction Decoder if Uprobes gets defined.
- Remove const attributes for instruction prefix arrays.
- Uses mm_context to know if the application is 32 bit.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Also-written-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Anton Arapov <anton@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linux-mm <linux-mm@kvack.org>
Link: http://lkml.kernel.org/r/20120209092642.GE16600@linux.vnet.ibm.com
[ Made various small edits to the commit log ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fad12ac8 25-Jan-2012 Thomas Renninger <trenn@suse.de>

CPU: Introduce ARCH_HAS_CPU_AUTOPROBE and X86 parts

This patch is based on Andi Kleen's work:
Implement autoprobing/loading of modules serving CPU
specific features (x86cpu autoloading).

And Kay Siever's work to get rid of sysdev cpu structures
and making use of struct device instead.

Before, the cpuid driver had to be loaded to get the x86cpu
autoloading feature. With this patch autoloading works through
the /sys/devices/system/cpu object

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 15a713df 26-Jan-2012 Mika Westerberg <mika.westerberg@linux.intel.com>

x86/config: Select MSIC MFD driver on Intel Medfield platform

On Intel Medfield platform we use MSIC MFD driver to create
necessary platform devices so it is essential to have the driver
compiled into the kernel.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: jacob.jun.pan@linux.intel.com
Link: http://lkml.kernel.org/n/tip-7hp1otk4wf4mg5pqohcwt06w@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1a8359e4 26-Jan-2012 Alan Cox <alan@linux.intel.com>

x86/mid: Remove Intel Moorestown

All production devices operate in the Oaktrail configuration
with legacy PC elements present and an ACPI BIOS. Continue
stripping out the Moorestown elements from the tree leaving
Medfield.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: jacob.jun.pan@linux.intel.com
Link: http://lkml.kernel.org/n/tip-fvm1hgpq99jln6l0fbek68ik@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3fe54564 24-Jan-2012 Daniel J Blueman <daniel@numascale-asia.com>

x86/numachip: Drop unnecessary conflict with EDAC

EDAC detection no longer crashes multi-node systems, so don't
conflict on it with NumaChip.

Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Cc: Steffen Persvold <sp@numascale.com>
Link: http://lkml.kernel.org/r/1327473349-28395-1-git-send-email-daniel@numascale-asia.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2ed86b16 25-Jan-2012 Rob Herring <rob.herring@calxeda.com>

irq: make SPARSE_IRQ an optionally hidden option

On ARM, we don't want SPARSE_IRQ to be a user visible option. Make
SPARSE_IRQ visible based on MAY_HAVE_SPARSE_IRQ instead of depending
on HAVE_SPARSE_IRQ.

With this, SPARSE_IRQ is not visible on C6X and ARM.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org


# 5ee71535 16-Jan-2012 Randy Dunlap <rdunlap@infradead.org>

x86/kconfig: Move the ZONE_DMA entry under a menu

Move the ZONE_DMA kconfig symbol under a menu item instead
of having it listed before everything else in
"make {xconfig | gconfig | nconfig | menuconfig}".

This drops the first line of the top-level kernel config menu
(in 3.2) below and moves it under "Processor type and features".

[*] DMA memory allocation support
General setup --->
[*] Enable loadable module support --->
[*] Enable the block layer --->
Processor type and features --->
Power management and ACPI options --->
Bus options (PCI etc.) --->
Executable file formats / Emulations --->

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Link: http://lkml.kernel.org/r/4F14811E.6090107@xenotime.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>


# 2565409f 12-Jan-2012 Heiko Carstens <hca@linux.ibm.com>

mm,x86,um: move CMPXCHG_DOUBLE config option

Move CMPXCHG_DOUBLE and rename it to HAVE_CMPXCHG_DOUBLE so architectures
can simply select the option if it is supported.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4156153c 12-Jan-2012 Heiko Carstens <hca@linux.ibm.com>

mm,x86,um: move CMPXCHG_LOCAL config option

Move CMPXCHG_LOCAL and rename it to HAVE_CMPXCHG_LOCAL so architectures
can simply select the option if it is supported.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 43570fd2 12-Jan-2012 Heiko Carstens <hca@linux.ibm.com>

mm,slub,x86: decouple size of struct page from CONFIG_CMPXCHG_LOCAL

While implementing cmpxchg_double() on s390 I realized that we don't set
CONFIG_CMPXCHG_LOCAL despite the fact that we have support for it.

However setting that option will increase the size of struct page by
eight bytes on 64 bit, which we certainly do not want. Also, it doesn't
make sense that a present cpu feature should increase the size of struct
page.

Besides that it looks like the dependency to CMPXCHG_LOCAL is wrong and
that it should depend on CMPXCHG_DOUBLE instead.

This patch:

If an architecture supports CMPXCHG_LOCAL this shouldn't result
automatically in larger struct pages if the SLUB allocator is used.
Instead introduce a new config option "HAVE_ALIGNED_STRUCT_PAGE" which
can be selected if a double word aligned struct page is required. Also
update x86 Kconfig so that it should work as before.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e39f5602 10-Jan-2012 David Daney <ddaney.cavm@gmail.com>

fs: binfmt_elf: create Kconfig variable for PIE randomization

Randomization of PIE load address is hard coded in binfmt_elf.c for X86
and ARM. Create a new Kconfig variable
(CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE) for this and use it instead. Thus
architecture specific policy is pushed out of the generic binfmt_elf.c and
into the architecture Kconfig files.

X86 and ARM Kconfigs are modified to select the new variable so there is
no change in behavior. A follow on patch will select it for MIPS too.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7c9c3a1e 29-Dec-2011 Alan Cox <alan@linux.intel.com>

x86/intel config: Fix the APB_TIMER selection

Seems Kconfig SELECT isn't selecting things hierarchically when
selected.

config APB_TIMER
def_bool y if X86_INTEL_MID
prompt "Intel MID APB Timer Support" if X86_INTEL_MID
select DW_APB_TIMER
depends on X86_INTEL_MID && SFI

when we select APB_TIMER doesn't select DW_APB_TIMER so do it by
hand.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-kpnaimplltk6d1lolusqj3ae@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 933b9463 17-Dec-2011 Alan Cox <alan@linux.intel.com>

x86/intel config: Revamp configuration to allow for Moorestown and Medfield

This sets all up the other bits that need to be INTEL_MID
specific rather than Moorestown specific.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/20111217174318.7207.91543.stgit@bob.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a0c3832a 17-Dec-2011 Alan Cox <alan@linux.intel.com>

x86/apb: Fix configuration constraints

The APB timer requires SFI, SCU and MID support

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/20111217215719.3743.93550.stgit@bob.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3e8f9451 15-Dec-2011 Alan Cox <alan@linux.intel.com>

x86: Fix INTEL_MID silly

Doh.. pass the brown paper bags - preferably filled with mince
pies..

This fixes occasional build failures.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-r0oc1knlvzuqr69artaeq8s8@git.kernel.org
[ extended the changelog a bit ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 291f3632 12-Dec-2011 Matt Fleming <matt.fleming@intel.com>

x86, efi: EFI boot stub support

There is currently a large divide between kernel development and the
development of EFI boot loaders. The idea behind this patch is to give
the kernel developers full control over the EFI boot process. As
H. Peter Anvin put it,

"The 'kernel carries its own stub' approach been very successful in
dealing with BIOS, and would make a lot of sense to me for EFI as
well."

This patch introduces an EFI boot stub that allows an x86 bzImage to
be loaded and executed by EFI firmware. The bzImage appears to the
firmware as an EFI application. Luckily there are enough free bits
within the bzImage header so that it can masquerade as an EFI
application, thereby coercing the EFI firmware into loading it and
jumping to its entry point. The beauty of this masquerading approach
is that both BIOS and EFI boot loaders can still load and run the same
bzImage, thereby allowing a single kernel image to work in any boot
environment.

The EFI boot stub supports multiple initrds, but they must exist on
the same partition as the bzImage. Command-line arguments for the
kernel can be appended after the bzImage name when run from the EFI
shell, e.g.

Shell> bzImage console=ttyS0 root=/dev/sdb initrd=initrd.img

v7:
- Fix checkpatch warnings.

v6:

- Try to allocate initrd memory just below hdr->inird_addr_max.

v5:

- load_options_size is UTF-16, which needs dividing by 2 to convert
to the corresponding ASCII size.

v4:

- Don't read more than image->load_options_size

v3:

- Fix following warnings when compiling CONFIG_EFI_STUB=n

arch/x86/boot/tools/build.c: In function ‘main’:
arch/x86/boot/tools/build.c:138:24: warning: unused variable ‘pe_header’
arch/x86/boot/tools/build.c:138:15: warning: unused variable ‘file_sz’

- As reported by Matthew Garrett, some Apple machines have GOPs that
don't have hardware attached. We need to weed these out by
searching for ones that handle the PCIIO protocol.

- Don't allocate memory if no initrds are on cmdline
- Don't trust image->load_options_size

Maarten Lankhorst noted:
- Don't strip first argument when booted from efibootmgr
- Don't allocate too much memory for cmdline
- Don't update cmdline_size, the kernel considers it read-only
- Don't accept '\n' for initrd names

v2:

- File alignment was too large, was 8192 should be 512. Reported by
Maarten Lankhorst on LKML.
- Added UGA support for graphics
- Use VIDEO_TYPE_EFI instead of hard-coded number.
- Move linelength assignment until after we've assigned depth
- Dynamically fill out AddressOfEntryPoint in tools/build.c
- Don't use magic number for GDT/TSS stuff. Requested by Andi Kleen
- The bzImage may need to be relocated as it may have been loaded at
a high address address by the firmware. This was required to get my
macbook booting because the firmware loaded it at 0x7cxxxxxx, which
triggers this error in decompress_kernel(),

if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
error("Destination address too large");

Cc: Mike Waychison <mikew@google.com>
Cc: Matthew Garrett <mjg@redhat.com>
Tested-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1321383097.2657.9.camel@mfleming-mobl1.ger.corp.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 0ee332c1 08-Dec-2011 Tejun Heo <tj@kernel.org>

memblock: Kill early_node_map[]

Now all ARCH_POPULATES_NODE_MAP archs select HAVE_MEBLOCK_NODE_MAP -
there's no user of early_node_map[] left. Kill early_node_map[] and
replace ARCH_POPULATES_NODE_MAP with HAVE_MEMBLOCK_NODE_MAP. Also,
relocate for_each_mem_pfn_range() and helper from mm.h to memblock.h
as page_alloc.c would no longer host an alternative implementation.

This change is ultimately one to one mapping and shouldn't cause any
observable difference; however, after the recent changes, there are
some functions which now would fit memblock.c better than page_alloc.c
and dependency on HAVE_MEMBLOCK_NODE_MAP instead of HAVE_MEMBLOCK
doesn't make much sense on some of them. Further cleanups for
functions inside HAVE_MEMBLOCK_NODE_MAP in mm.h would be nice.

-v2: Fix compile bug introduced by mis-spelling
CONFIG_HAVE_MEMBLOCK_NODE_MAP to CONFIG_MEMBLOCK_HAVE_NODE_MAP in
mmzone.h. Reported by Stephen Rothwell.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>


# 4e2b1c4f 06-Dec-2011 Alan Cox <alan@linux.intel.com>

x86/intel_mid: Kconfig select fix

If we select a symbol it should have a type declared first
otherwise in some situations the config tools get upset. They
are currently perhaps a bit too resilient which is why this
wasn't noticed initially.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/20111206132811.4041.32549.stgit@bob.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# dd137525 05-Dec-2011 Alan Cox <alan@linux.intel.com>

x86/intel_mid: Fix the Kconfig for MID selection

We currently fail to build on CONFIG_X86_INTEL_MID=y and
CONFIG_X86_MRST unset.

We could build all the bits to make generic MID work if you
picked MID platform alone but that's really silly. Instead use
select and two variables.

This looks a bit daft right now but once we add a Medfield
selection it'll start to look a good deal more sensible.

Reported-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/20111205231433.28811.51297.stgit@bob.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f9b15df4 28-Oct-2011 Alessandro Rubini <rubini@gnudd.com>

x86/Kconfig: Cyclone-timer depends on x86-summit

CONFIG_X86_CYCLONE_TIMER depends on CONFIG_X86_32_NON_STANDARD,
which forces drivers/clocksource/cyclone.c to be compiled. The
file doesn't do anything unless enabled by
arch/x86/kernel/apic/summit_32.c

Make CONFIG_X86_CYCLONE_TIMER depend by X86_SUMMIT instead, to
avoid unnecessary code in other non-standard systems.

Signed-off-by: Alessandro Rubini <rubini@gnudd.com>
Cc: john stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/20111028224842.GA7582@mail.gnudd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 44b111b5 05-Dec-2011 Steffen Persvold <sp@numascale.com>

x86: Add NumaChip support

Adds support for Numascale NumaChip large-SMP systems. It is
needed to enable the booting of more than ~168 cores.

v2:
- [Steffen] enumerate only accessible northbridges
- [Daniel] rediffed and validated against 3.1-rc10

v3:
- [Daniel] use x86_init core numbering override
- [Daniel] cleanups as per feedback

v4:
- [Daniel] use updated x86_cpuinit override

v5:
- drop disabling interrupts locally, as ISR write is atomic; drop delay
- added read-mostly annotations where appropriate
- require CONFIG_SMP, so drop conditional path

Workload tested on 96 cores/16 sockets.

Signed-off-by: Steffen Persvold <sp@numascale.com>
Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Link: http://lkml.kernel.org/r/1323101246-2400-1-git-send-email-daniel@numascale-asia.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1ea7c673 10-Nov-2011 Alan Cox <alan@linux.intel.com>

x86/config: Revamp configuration for MID devices

This follows on from the patch applied in 3.2rc1 which creates
an INTEL_MID configuration. We can now add the entry for
Medfield specific code. After this is merged the final patch
will be submitted which moves the rest of the device Kconfig
dependancies to MRST/MEDFIELD/INTEL_MID as appropriate.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 4673ca8e 24-Nov-2011 Michael S. Tsirkin <mst@redhat.com>

lib: move GENERIC_IOMAP to lib/Kconfig

define GENERIC_IOMAP in a central location
instead of all architectures. This will be helpful
for the follow-up patch which makes it select
other configs. Code is also a bit shorter this way.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 282e5aab 17-Nov-2011 Paul Bolle <pebolle@tiscali.nl>

x86: Kconfig: drop unknown symbol 'APM_MODULE'

There's no Kconfig symbol APM_MODULE, so the check for it will always
fail. There's no need to append _MODULE to tristate symbols anyhow,
because the config tools will do the right thing automagically.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 4f4d7a9b 24-Oct-2011 Paul Bolle <pebolle@tiscali.nl>

x86: drop unused Kconfig symbol

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>


# 141d55e6 12-Oct-2011 Yinghai Lu <yinghai.lu@oracle.com>

x86/irq: Standardize on CONFIG_SPARSE_IRQ=y

Sparseirq got introduced in v2.6.28 and Thomas did a huge cleanup
around v2.6.38 that eliminated basically all disadvantages
of it.

So we can remove non-sparseirq support now and simplify
our IRQ degrees of freedom a bit.

Suggested-and-acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/4E95E21D.6090200@oracle.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 395cf969 14-Aug-2011 Paul Bolle <pebolle@tiscali.nl>

doc: fix broken references

There are numerous broken references to Documentation files (in other
Documentation files, in comments, etc.). These broken references are
caused by typo's in the references, and by renames or removals of the
Documentation files. Some broken references are simply odd.

Fix these broken references, sometimes by dropping the irrelevant text
they were part of.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# d4f3e350 20-Sep-2011 Ed Wildgoose <git@wildgooses.com>

x86: geode: New PCEngines Alix system driver

This new driver replaces the old PCEngines Alix 2/3 LED driver with a
new driver that controls the LEDs through the leds-gpio driver. The
old driver accessed GPIOs directly, which created a conflict and
prevented also loading the cs5535-gpio driver to read other GPIOs on
the Alix board. With this new driver, we hook into leds-gpio which in
turn uses GPIO to control the LEDs and therefore it's possible to
control both the LEDs and access onboard GPIOs

Driver is moved to platform/geode as requested by Grant and any other
geode initialisation modules should move here also

This driver is inspired by leds-net5501.c by Alessandro Zummo.

Ideally, leds-net5501.c should also be moved to platform/geode.
Additionally the driver relies on parts of the patch: 7f131cf3ed ("leds:
leds-alix2c - take port address from MSR) by Daniel Mack to perform
detection of the Alix board.

[akpm@linux-foundation.org: include module.h]

Signed-off-by: Ed Wildgoose <kernel@wildgooses.com>
Cc: git@wildgooses.com
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Daniel Mack <daniel@caiaq.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Reviewed-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# d3f13810 23-Aug-2011 Suresh Siddha <suresh.b.siddha@intel.com>

iommu: Rename the DMAR and INTR_REMAP config options

Change the CONFIG_DMAR to CONFIG_INTEL_IOMMU to be consistent
with the other IOMMU options.

Rename the CONFIG_INTR_REMAP to CONFIG_IRQ_REMAP to match the
irq subsystem name.

And define the CONFIG_DMAR_TABLE for the common ACPI DMAR
routines shared by both CONFIG_INTEL_IOMMU and CONFIG_IRQ_REMAP.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: youquan.song@intel.com
Cc: joerg.roedel@amd.com
Cc: tony.luck@intel.com
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/20110824001456.558630224@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# d1748302 23-Aug-2011 Martin Schwidefsky <schwidefsky@de.ibm.com>

clockevents: Make minimum delay adjustments configurable

The automatic increase of the min_delta_ns of a clockevents device
should be done in the clockevents code as the minimum delay is an
attribute of the clockevents device.

In addition not all architectures want the automatic adjustment, on a
massively virtualized system it can happen that the programming of a
clock event fails several times in a row because the virtual cpu has
been rescheduled quickly enough. In that case the minimum delay will
erroneously be increased with no way back. The new config symbol
GENERIC_CLOCKEVENTS_MIN_ADJUST is used to enable the automatic
adjustment. The config option is selected only for x86.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: john stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/20110823133142.494157493@de.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# df013ffb 12-Jul-2011 Huang Ying <ying.huang@intel.com>

Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG

cmpxchg() is widely used by lockless code, including NMI-safe lockless
code. But on some architectures, the cmpxchg() implementation is not
NMI-safe, on these architectures the lockless code may need a
spin_trylock_irqsave() based implementation.

This patch adds a Kconfig option: ARCH_HAVE_NMI_SAFE_CMPXCHG, so that
NMI-safe lockless code can depend on it or provide different
implementation according to it.

On many architectures, cmpxchg is only NMI-safe for several specific
operand sizes. So, ARCH_HAVE_NMI_SAFE_CMPXCHG define in this patch
only guarantees cmpxchg is NMI-safe for sizeof(unsigned long).

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Richard Henderson <rth@twiddle.net>
CC: Mikael Starvik <starvik@axis.com>
Acked-by: David Howells <dhowells@redhat.com>
CC: Yoshinori Sato <ysato@users.sourceforge.jp>
CC: Tony Luck <tony.luck@intel.com>
CC: Hirokazu Takata <takata@linux-m32r.org>
CC: Geert Uytterhoeven <geert@linux-m68k.org>
CC: Michal Simek <monstr@monstr.eu>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
CC: Kyle McMartin <kyle@mcmartin.ca>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Chen Liqin <liqin.chen@sunplusct.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Ingo Molnar <mingo@redhat.com>
CC: Chris Zankel <chris@zankel.net>
Signed-off-by: Len Brown <len.brown@intel.com>


# 628c6246 31-Jul-2011 H. Peter Anvin <hpa@zytor.com>

x86, random: Architectural inlines to get random integers with RDRAND

Architectural inlines to get random ints and longs using the RDRAND
instruction.

Intel has introduced a new RDRAND instruction, a Digital Random Number
Generator (DRNG), which is functionally an high bandwidth entropy
source, cryptographic whitener, and integrity monitor all built into
hardware. This enables RDRAND to be used directly, bypassing the
kernel random number pool.

For technical documentation, see:

http://software.intel.com/en-us/articles/download-the-latest-bull-mountain-software-implementation-guide/

In this patch, this is *only* used for the nonblocking random number
pool. RDRAND is a nonblocking source, similar to our /dev/urandom,
and is therefore not a direct replacement for /dev/random. The
architectural hooks presented in the previous patch only feed the
kernel internal users, which only use the nonblocking pool, and so
this is not a problem.

Since this instruction is available in userspace, there is no reason
to have a /dev/hw_rng device driver for the purpose of feeding rngd.
This is especially so since RDRAND is a nonblocking source, and needs
additional whitening and reduction (see the above technical
documentation for details) in order to be of "pure entropy source"
quality.

The CONFIG_EXPERT compile-time option can be used to disable this use
of RDRAND.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Originally-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Theodore Ts'o" <tytso@mit.edu>


# d8d01a63 24-Jul-2011 Daniel Drake <dsd@laptop.org>

x86, olpc: Fix dependency on POWER_SUPPLY

As reported by Randy Dunlap, CONFIG_POWER_SUPPLY=m caused a
compile error:

arch/x86/built-in.o: In function `battery_status_changed':
olpc-xo15-sci.c:(.text+0x3acdd): undefined reference to `power_supply_get_by_name'
olpc-xo15-sci.c:(.text+0x3ad04): undefined reference to `power_supply_changed'

The SCI drivers, as bool, require POWER_SUPPLY to be builtin.
Use select to make that a hard requirement and avoid this build
failure.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0aba496f 27-May-2011 Shaohua Li <shaohua.li@intel.com>

x86/PCI: select direct access mode for mmconfig option

Direct access is needed in mmconf mode too. There are two reasons:
1. we need it to access first 256 bytes. We have bug before that
using mmconf to access pci config space hangs system (when
resizing BARs), so we use type1 config for legacy config space.
2. when doing mmconfg bar checking, we need access ACPI _CRS,
which might access PCI config space.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# ae7bd11b 21-Jul-2011 H. Peter Anvin <hpa@zytor.com>

clocksource: Change __ARCH_HAS_CLOCKSOURCE_DATA to a CONFIG option

The machinery for __ARCH_HAS_CLOCKSOURCE_DATA assumed a file in
asm-generic would be the default for architectures without their own
file in asm/, but that is not how it works.

Replace it with a Kconfig option instead.

Link: http://lkml.kernel.org/r/4E288AA6.7090804@zytor.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Tony Luck <tony.luck@intel.com>


# 43605ef1 12-Jul-2011 Alan Cox <alan@linux.intel.com>

x86, config: Introduce an INTEL_MID configuration

We need to carve up the configuration between:

- MID general
- Moorestown specific
- Medfield specific
- Future devices

As a base point create an INTEL_MID configuration property. We
make the existing MRST configuration a sub-option. This means
that the rest of the kernel config can still use X86_MRST checks
without anything going backwards.

After this is merged future patches will tidy up which devices
are MID and which are X86_MRST, as well as add options for
Medfield.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/20110712164859.7642.84136.stgit@bob.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c378ddd5 14-Jul-2011 Tejun Heo <tj@kernel.org>

memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option

From 6839454ae63f1eb21e515c10229ca95c22955fec Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Thu, 14 Jul 2011 11:22:17 +0200

Make ARCH_DISCARD_MEMBLOCK a config option so that it can be handled
together with other MEMBLOCK options.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110714094603.GH3455@htj.dyndns.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 0608f70c 14-Jul-2011 Tejun Heo <tj@kernel.org>

x86: Use HAVE_MEMBLOCK_NODE_MAP

From 5732e1247898d67cbf837585150fe9f68974671d Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Thu, 14 Jul 2011 11:22:16 +0200

Convert x86 to HAVE_MEMBLOCK_NODE_MAP. The only difference in memory
handling is that allocations can't no longer cross node boundaries
whether they're node affine or not, which shouldn't matter at all.

This conversion will enable further simplification of boot memory
handling.

-v2: Fix build failure on !NUMA configurations discovered by hpa.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110714094423.GG3455@htj.dyndns.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 095c0aa8 11-Jul-2011 Glauber Costa <glommer@redhat.com>

sched: adjust scheduler cpu power for stolen time

This patch makes update_rq_clock() aware of steal time.
The mechanism of operation is not different from irq_time,
and follows the same principles. This lives in a CONFIG
option itself, and can be compiled out independently of
the rest of steal time reporting. The effect of disabling it
is that the scheduler will still report steal time (that cannot be
disabled), but won't use this information for cpu power adjustments.

Everytime update_rq_clock_task() is invoked, we query information
about how much time was stolen since last call, and feed it into
sched_rt_avg_update().

Although steal time reporting in account_process_tick() keeps
track of the last time we read the steal clock, in prev_steal_time,
this patch do it independently using another field,
prev_steal_time_rq. This is because otherwise, information about time
accounted in update_process_tick() would never reach us in update_rq_clock().

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Eric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>


# 5da0ef9a 11-Jul-2011 Tejun Heo <tj@kernel.org>

x86: Disable AMD_NUMA for 32bit for now

Commit 2706a0bf7b ("x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit
too") enabled AMD NUMA for 32bit too. Unfortunately, SPARSEMEM
on 32bit had rather coarse (512MiB) addr->node mapping
granularity due to lack of space in page->flags. This led to
boot failure on certain AMD NUMA machines which had 128MiB
alignment on nodes.

Patches to properly detect this condition and reject NUMA
configuration are posted[1] but deemed too pervasive for merge
at this point (-rc6). Disable AMD NUMA for 32bit for now and
re-enable once the detection logic is merged.

[1] http://thread.gmane.org/gmane.linux.kernel/1161279/focus=1162583

Reported-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Conny Seidel <conny.seidel@amd.com>
Link: http://lkml.kernel.org/r/20110711083432.GC943@htj.dyndns.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2dc98fd3 08-Jul-2011 Michael Witten <mfwitten@gmail.com>

doc: Konfig: Documentation/power/{pm => apm-acpi}.txt

Signed-off-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# a0f30f59 25-Jun-2011 Daniel Drake <dsd@laptop.org>

x86, olpc: Add XO-1.5 SCI driver

Add a driver for the ACPI-based EC event interface found on the
OLPC XO-1.5 laptop. This enables notification of battery/AC power events,
and enables various devices to be used as wakeup sources through regular
ACPI mechanisms.

This driver can't be built as a module, because some drivers need to know
at boot-time if SCI-based functionality is available via
olpc_ec_wakeup_available().

Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-12-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# cfee9597 25-Jun-2011 Daniel Drake <dsd@laptop.org>

x86, olpc: Add XO-1 RTC driver

Add a driver to configure the XO-1 RTC via CS5536 MSRs, to be used as a
system wakeup source via olpc-xo1-pm.

Device detection is based on finding the relevant device tree node.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-11-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: devicetree-discuss@lists.ozlabs.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# e1040ac6 25-Jun-2011 Daniel Drake <dsd@laptop.org>

x86, olpc-xo1-sci: Propagate power supply/battery events

EC events indicate change in AC power connectivity, battery state of
charge, battery error, battery presence, etc. Send notifications to
the power supply subsystem when changes are detected.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-10-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 2cf2baea 25-Jun-2011 Daniel Drake <dsd@laptop.org>

x86, olpc-xo1-sci: Add lid switch functionality

Configure the XO-1's lid switch GPIO to trigger an SCI interrupt,
and correctly expose this input device which can be used as a wakeup
source.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-9-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 7bc74b3d 25-Jun-2011 Daniel Drake <dsd@laptop.org>

x86, olpc-xo1-sci: Add GPE handler and ebook switch functionality

The EC in the OLPC XO-1 delivers GPE events to provide various
notifications. Add the basic code for GPE/EC event processing and
enable the ebook switch, which can be used as a wakeup source.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-8-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 7feda8e9 25-Jun-2011 Daniel Drake <dsd@laptop.org>

x86, olpc: Add XO-1 SCI driver and power button control

The System Control Interrupt is used in the OLPC XO-1 to control various
features of the laptop. Add the driver base and the power button
functionality.

This driver can't be built as a module, because functionality added in
future patches means that some drivers need to know at boot-time whether
SCI-based functionality is available.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-6-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 97c4cb71 25-Jun-2011 Daniel Drake <dsd@laptop.org>

x86, olpc: Add XO-1 suspend/resume support

Add code needed for basic suspend/resume of the XO-1 laptop.
Based on earlier work by Jordan Crouse, Andres Salomon, and others.

This patch incorporates all earlier feedback from Thomas Gleixner. To
clarify a certain point (now more obvious in the code itself):
On resume, OpenFirmware returns execution to Linux in protected mode
with a kernel-compatible GDT already set up. The changes and
simplifications suggested have all been included.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-5-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# a3128588 25-Jun-2011 Daniel Drake <dsd@laptop.org>

x86, olpc: Rename olpc-xo1 to olpc-xo1-pm

Based on earlier review comments, we'll no longer try to stick all of
our XO-1 goodies in a single driver. We'll split it into a power management
driver, and an EC/SCI driver.

As a first step, rename olpc-xo1 to olpc-xo1-pm, and make it builtin
instead of modular.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-4-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 0a779c57 09-Jun-2011 Thomas Gleixner <tglx@linutronix.de>

x86: Use common i8253 clockevent

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20110609130622.026152527@linutronix.de


# 06c3df49 05-Jun-2011 Jamie Iles <jamie@jamieiles.com>

clocksource: apb: Share APB timer code with other platforms

The APB timers are an IP block from Synopsys (DesignWare APB timers)
and are also found in other systems including ARM SoC's. This patch
adds functions for creating clock_event_devices and clocksources from
APB timers but does not do the resource allocation. This is handled
in a higher layer to allow the timers to be created from multiple
methods such as platform_devices.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>


# 166e9278 10-Jun-2011 Ohad Ben-Cohen <ohad@wizery.com>

x86/ia64: intel-iommu: move to drivers/iommu/

This should ease finding similarities with different platforms,
with the intention of solving problems once in a generic framework
which everyone can use.

Note: to move intel-iommu.c, the declaration of pci_find_upstream_pcie_bridge()
has to move from drivers/pci/pci.h to include/linux/pci.h. This is handled
in this patch, too.

As suggested, also drop DMAR's EXPERIMENTAL tag while we're at it.

Compile-tested on x86_64.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>


# 29b68415 05-Jun-2011 Ohad Ben-Cohen <ohad@wizery.com>

x86: amd_iommu: move to drivers/iommu/

This should ease finding similarities with different platforms,
with the intention of solving problems once in a generic framework
which everyone can use.

Compile-tested on x86_64.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>


# ab493a0f 01-Jun-2011 Ohad Ben-Cohen <ohad@wizery.com>

drivers: iommu: move to a dedicated folder

Create a dedicated folder for iommu drivers, and move the base
iommu implementation over there.

Grouping the various iommu drivers in a single location will help
finding similar problems shared by different platforms, so they
could be solved once, in the iommu framework, instead of solved
differently (or duplicated) in each driver.

Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>


# 8761f1ab 01-Jun-2011 Ralf Baechle <ralf@linux-mips.org>

pcspkr: Cleanup Kconfig dependencies

Lenghty lists of the kind "depends on ARCH1 || ARCH2 ... || ARCH123" are
usually either wrong or too coarse grained. Or plain an ugly sin.

[ tglx: Fixed up amigaone ]

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux-alpha@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Gerhard Pircher <gerhard_pircher@gmx.net>
Link: http://lkml.kernel.org/r/20110601180610.984881988@duck.linux-mips.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 15f304b6 01-Jun-2011 Ralf Baechle <ralf@linux-mips.org>

i8253: Consolidate all kernel definitions of i8253_lock

Move them to drivers/clocksource/i8253.c and remove the
implementations in arch/

[ tglx: Avoid the extra file in lib - folded arch patches in. The
export will become conditional in a later step ]

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Link: http://lkml.kernel.org/r/20110601180610.221426078@duck.linux-mips.net
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 63e424c8 26-May-2011 Akinobu Mita <akinobu.mita@gmail.com>

arch: remove CONFIG_GENERIC_FIND_{NEXT_BIT,BIT_LE,LAST_BIT}

By the previous style change, CONFIG_GENERIC_FIND_NEXT_BIT,
CONFIG_GENERIC_FIND_BIT_LE, and CONFIG_GENERIC_FIND_LAST_BIT are not used
to test for existence of find bitops anymore.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 949a9d70 25-May-2011 Jean Delvare <khali@linux-fr.org>

i8k: Integrate with the hwmon subsystem

Let i8k create an hwmon class device so that libsensors will expose
the CPU temperature and fan speeds to monitoring applications.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Massimo Dal Zotto <dz@debian.org>


# dbee8a0a 24-May-2011 Roland Dreier <roland@purestorage.com>

x86: remove 32-bit versions of readq()/writeq()

The presense of a writeq() implementation on 32-bit x86 that splits the
64-bit write into two 32-bit writes turns out to break the mpt2sas driver
(and in general is risky for drivers as was discussed in
<http://lkml.kernel.org/r/adaab6c1h7c.fsf@cisco.com>). To fix this,
revert 2c5643b1c5c7 ("x86: provide readq()/writeq() on 32-bit too") and
follow-on cleanups.

This unfortunately leads to pushing non-atomic definitions of readq() and
write() to various x86-only drivers that in the meantime started using the
definitions in the x86 version of <asm/io.h>. However as discussed
exhaustively, this is actually the right thing to do, because the right
way to split a 64-bit transaction is hardware dependent and therefore
belongs in the hardware driver (eg mpt2sas needs a spinlock to make sure
no other accesses occur in between the two halves of the access).

Build tested on 32- and 64-bit x86 allmodconfig.

Link: http://lkml.kernel.org/r/x86-32-writeq-is-broken@mdm.bga.com
Acked-by: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Kashyap Desai <Kashyap.Desai@lsi.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ravi Anand <ravi.anand@qlogic.com>
Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Acked-by: James Bottomley <James.Bottomley@parallels.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bb0a56ec 19-May-2011 Dave Jones <davej@redhat.com>

[CPUFREQ] Move x86 drivers to drivers/cpufreq/

Signed-off-by: Dave Jones <davej@redhat.com>


# dc382fd5 16-May-2011 David Rientjes <rientjes@google.com>

x86, mm: Allow ZONE_DMA to be configurable

ZONE_DMA is unnecessary for a large number of machines that do not
require less than 32-bit DMA addressing, e.g. ISA legacy DMA or PCI
cards with a restricted DMA address mask.

This patch allows users to disable ZONE_DMA for x86 if they know they
will not be using such devices with their kernel.

This prevents the VM from unnecessarily reserving a ratio of memory
(defaulting to 1/256th of system capacity) with lowmem_reserve_ratio
for such allocations when it will never be used.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1105161353560.4353@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 82491451 08-May-2011 Russell King <rmk+kernel@arm.linux.org.uk>

clocksource: convert x86 to generic i8253 clocksource

Convert x86 i8253 clocksource code to use generic i8253 clocksource.

Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>


# 2e711c04 26-Apr-2011 Rafael J. Wysocki <rjw@rjwysocki.net>

PM: Remove sysdev suspend, resume and shutdown operations

Since suspend, resume and shutdown operations in struct sysdev_class
and struct sysdev_driver are not used any more, remove them. Also
drop sysdev_suspend(), sysdev_resume() and sysdev_shutdown() used
for executing those operations and modify all of their users
accordingly. This reduces kernel code size quite a bit and reduces
its complexity.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>


# 9cddf15f 04-May-2011 Randy Dunlap <randy.dunlap@oracle.com>

x86/net: only select BPF_JIT when NET is enabled

Fix kconfig unmet dependency warning: HAVE_BPF_JIT depends on NET, so
make the "select" of it depend on NET also.

warning: (X86) selects HAVE_BPF_JIT which has unmet direct dependencies (NET)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1b7e03ef 02-May-2011 Tejun Heo <tj@kernel.org>

x86, NUMA: Enable emulation on 32bit too

Now that NUMA init path is unified, NUMA emulation can be enabled on
32bit. Make numa_emluation.c safe on 32bit by doing the followings.

* Define MAX_DMA32_PFN on 32bit too.

* Include bootmem.h for max_pfn declaration.

* Use u64 explicitly and always use PFN_PHYS() when converting page
number to address.

* Avoid __udivdi3() generation on 32bit by doing number of pages
calculation instead in split_nodes_interleave().

And drop X86_64 dependency from Kconfig.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>


# 2706a0bf 02-May-2011 Tejun Heo <tj@kernel.org>

x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too

Now that NUMA init path is unified, amdtopology can be enabled on
32bit. Make amdtopology.c safe on 32bit by explicitly using u64 and
drop X86_64 dependency from Kconfig.

Inclusion of bootmem.h is added for max_pfn declaration.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>


# 0a14842f 20-Apr-2011 Eric Dumazet <eric.dumazet@gmail.com>

net: filter: Just In Time compiler for x86-64

In order to speedup packet filtering, here is an implementation of a
JIT compiler for x86_64

It is disabled by default, and must be enabled by the admin.

echo 1 >/proc/sys/net/core/bpf_jit_enable

It uses module_alloc() and module_free() to get memory in the 2GB text
kernel range since we call helpers functions from the generated code.

EAX : BPF A accumulator
EBX : BPF X accumulator
RDI : pointer to skb (first argument given to JIT function)
RBP : frame pointer (even if CONFIG_FRAME_POINTER=n)
r9d : skb->len - skb->data_len (headlen)
r8 : skb->data

To get a trace of generated code, use :

echo 2 >/proc/sys/net/core/bpf_jit_enable

Example of generated code :

# tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24

flen=18 proglen=147 pass=3 image=ffffffffa00b5000
JIT code: ffffffffa00b5000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 60
JIT code: ffffffffa00b5010: 44 2b 4f 64 4c 8b 87 b8 00 00 00 be 0c 00 00 00
JIT code: ffffffffa00b5020: e8 24 7b f7 e0 3d 00 08 00 00 75 28 be 1a 00 00
JIT code: ffffffffa00b5030: 00 e8 fe 7a f7 e0 24 00 3d 00 14 a8 c0 74 49 be
JIT code: ffffffffa00b5040: 1e 00 00 00 e8 eb 7a f7 e0 24 00 3d 00 14 a8 c0
JIT code: ffffffffa00b5050: 74 36 eb 3b 3d 06 08 00 00 74 07 3d 35 80 00 00
JIT code: ffffffffa00b5060: 75 2d be 1c 00 00 00 e8 c8 7a f7 e0 24 00 3d 00
JIT code: ffffffffa00b5070: 14 a8 c0 74 13 be 26 00 00 00 e8 b5 7a f7 e0 24
JIT code: ffffffffa00b5080: 00 3d 00 14 a8 c0 75 07 b8 ff ff 00 00 eb 02 31
JIT code: ffffffffa00b5090: c0 c9 c3

BPF program is 144 bytes long, so native program is almost same size ;)

(000) ldh [12]
(001) jeq #0x800 jt 2 jf 8
(002) ld [26]
(003) and #0xffffff00
(004) jeq #0xc0a81400 jt 16 jf 5
(005) ld [30]
(006) and #0xffffff00
(007) jeq #0xc0a81400 jt 16 jf 17
(008) jeq #0x806 jt 10 jf 9
(009) jeq #0x8035 jt 10 jf 17
(010) ld [28]
(011) and #0xffffff00
(012) jeq #0xc0a81400 jt 16 jf 13
(013) ld [38]
(014) and #0xffffff00
(015) jeq #0xc0a81400 jt 16 jf 17
(016) ret #65535
(017) ret #0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9844b4e5 05-Apr-2011 Joerg Roedel <joerg.roedel@amd.com>

x86/amd-iommu: Select PCI_IOV with AMD IOMMU driver

In order to support ATS in the AMD IOMMU driver this patch
makes sure that the generic support for ATS is compiled in.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>


# ce9c99af 08-Apr-2011 Ian Campbell <ijc@hellion.org.uk>

x86, cpu: Move AMD Elan Kconfig under "Processor family"

Currently the option resides under X86_EXTENDED_PLATFORM due to historical
nonstandard A20M# handling. However that is no longer the case and so Elan can
be treated as part of the standard processor choice Kconfig option.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Link: http://lkml.kernel.org/r/1302245177.31620.47.camel@localhost.localdomain
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 3b16651f 01-Apr-2011 Tejun Heo <tj@kernel.org>

x86: Clean up memory model related configs in arch/x86/Kconfig

* Remove bogus dependency on ARCH_SELECT_MEMORY_MODEL from
ARCH_FLATMEM_ENABLE. ENABLE configs don't interfere with
SELECT_MEMORY_MODEL. They just need to indicate whether the
specific memory model is supported.

* Relocate HAVE_ARCH_ALLOC_REMAP, ARCH_PROC_KCORE_TEXT and
ARCH_SPARSEMEM_DEFAULT so that memory model related configs are
together in consistent order.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>


# 05293608 01-Apr-2011 Tejun Heo <tj@kernel.org>

x86-64, NUMA: Remove custom phys_to_nid() implementation

phys_to_nid() maps physical address to NUMA node id. This is
implemented by building perfect hash in compute_hash_shift() during
initialization.

However, with SPARSE memory model, the nid is encoded in page flags.
The perfect hash implementation was for DISCONTIG memory model which
got removed years ago by b263295dbf (x86: 64-bit, make sparsemem
vmemmap the only memory model).

So, the perfect hash ends up being used only during initialization
when the core SPARSE code already provides perfectly acceptable
generic early_pfn_to_nid() implementation.

Drop phys_to_nid() and use the generic ealry_pfn_to_nid() instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>


# 388b78ad 23-Mar-2011 Alexandre Bounine <alexandre.bounine@idt.com>

rapidio: modify configuration to support PCI-SRIO controller

1. Add an option to include RapidIO support if the PCI is available.
2. Add FSL_RIO configuration option to enable controller selection.
3. Add RapidIO support option into x86 and MIPS architectures.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Micha Nelissen <micha@neli.hopto.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d47d81c0 23-Mar-2011 Rafael J. Wysocki <rjw@rjwysocki.net>

Introduce ARCH_NO_SYSDEV_OPS config option (v2)

Introduce Kconfig option allowing architectures where sysdev
operations used during system suspend, resume and shutdown have been
completely replaced with struct sycore_ops operations to avoid
building sysdev code that will never be used.

Make callbacks in struct sys_device and struct sysdev_driver depend
on ARCH_NO_SYSDEV_OPS to allows us to verify if all of the references
have been actually removed from the code the given architecture
depends on.

Make x86 select ARCH_NO_SYSDEV_OPS.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>


# 1c00f016 22-Mar-2011 David Rientjes <rientjes@google.com>

x86: allow CONFIG_ISA_DMA_API to be disabled

Not all 64-bit systems require ISA-style DMA, so allow it to be
configurable. x86 utilizes the generic ISA DMA allocator from
kernel/dma.c, so require it only when CONFIG_ISA_DMA_API is enabled.

Disabling CONFIG_ISA_DMA_API is dependent on x86_64 since those machines
do not have ISA slots and benefit the most from disabling the option (and
on CONFIG_EXPERT as required by H. Peter Anvin).

When disabled, this also avoids declaring claim_dma_lock(),
release_dma_lock(), request_dma(), and free_dma() since those interfaces
will no longer be provided.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8df3bd9e 22-Mar-2011 David Rientjes <rientjes@google.com>

x86: only compile floppy driver if CONFIG_ISA_DMA_API is enabled

The generic floppy disk driver utilizies the interface provided by
CONFIG_ISA_DMA_API, specifically claim_dma_lock(), release_dma_lock(),
request_dma(), and free_dma(). Thus, there's a strict dependency on the
config option and the driver should only be loaded if the kernel supports
ISA-style DMA.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 45bb1674 13-Mar-2011 Daniel Drake <dsd@laptop.org>

x86, olpc: Use device tree for platform identification

Make OLPC fully depend on device tree, and use it to identify the OLPC
platform details. Some nodes are exposed as platform devices where we
plan to use device tree for device probing.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
LKML-Reference: <20110313151017.C255F9D401E@zog.reactivated.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# c0185808 06-Feb-2011 Thomas Gleixner <tglx@linutronix.de>

x86: Enable forced interrupt threading support

All non threadeable interrupts are marked. Enable forced irq threading
support.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 517e4981 16-Dec-2010 Thomas Gleixner <tglx@linutronix.de>

x86: Use generic show_interrupts

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# c49aa5bd 08-Mar-2011 Jan Beulich <JBeulich@novell.com>

x86: Remove dead config option X86_CPU

This isn't being referenced anywhere, and the selects done from
it can be easily done together with all the other X86 ones.

v2: Also adjust UML's Kconfig.x86.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4D7603DA02000078000351C1@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ee009e4a0 07-Mar-2011 David Howells <dhowells@redhat.com>

KEYS: Add an iovec version of KEYCTL_INSTANTIATE

Add a keyctl op (KEYCTL_INSTANTIATE_IOV) that is like KEYCTL_INSTANTIATE, but
takes an iovec array and concatenates the data in-kernel into one buffer.
Since the KEYCTL_INSTANTIATE copies the data anyway, this isn't too much of a
problem.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>


# da6b737b 22-Feb-2011 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

x86: Add device tree support

This patch adds minimal support for device tree on x86. The device
tree blob is passed to the kernel via setup_data which requires at
least boot protocol 2.09.

Memory size, restricted memory regions, boot arguments are gathered
the traditional way so things like cmd_line are just here to let the
code compile.

The current plan is use the device tree as an extension and to gather
information which can not be enumerated and would have to be hardcoded
otherwise. This includes things like
- which devices are on this I2C/SPI bus?
- how are the interrupts wired to IO APIC?
- where could my hpet be?

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Cc: sodaville@linutronix.de
Cc: devicetree-discuss@lists.ozlabs.org
LKML-Reference: <1298405266-1624-3-git-send-email-bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 1444e0c9 22-Feb-2011 Henrik Kretzschmar <henne@nachtwindheim.de>

x86: Fix deps of X86_UP_IOAPIC

Since commit 7cd92366a593246650cc7d6198e2c7d3af8c1d8a
lAPIC enabled accidently the IOAPIC, which now gets fixed.

Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
LKML-Reference: <1298385487-4708-5-git-send-email-henne@nachtwindheim.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c2a941fa 23-Feb-2011 Thomas Gleixner <tglx@linutronix.de>

x86: OLPC: Remove extra OLPC_OPENFIRMWARE_DT indirection

OLPC_OPENFIRMWARE_DT is just there to be selected by OLPC and selects
OF_PROMTREE. So let OLPC select OF_PROMTREE and remove that extra
config indirection. Fixup code and Makefile and use CONFIG_OF_PROMTREE
instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andres Salomon <dilinger@queued.net>


# dc3119e70 23-Feb-2011 Thomas Gleixner <tglx@linutronix.de>

x86: OLPC: Cleanup config maze completely

Neither CONFIG_OLPC_OPENFIRMWARE nor CONFIG_OLPC_OPENFIRMWARE_DT are
really necessary.

OLPC selects OLPC_OPENFIRMWARE unconditionally, so move the "select
OF" part under OLPC config option and fixup the dependencies in
Makefiles and code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andres Salomon <dilinger@queued.net>


# fe239545 23-Feb-2011 Thomas Gleixner <tglx@linutronix.de>

x86: OLPC: Hide OLPC_OPENFIRMWARE config switch

OLPC selects OLPC_OPENFIRMWARE unconditionally. If OLPC=n then
the OLPC_OPENFIRMWARE functionality is pointless.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andres Salomon <dilinger@queued.net>


# 54008979 23-Feb-2011 Thomas Gleixner <tglx@linutronix.de>

x86: OLPC: Remove redundant !X64_64 config dependency

OLPC is under if X86_32 already.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andres Salomon <dilinger@queued.net>


# 4822b7fc 14-Feb-2011 H. Peter Anvin <hpa@linux.intel.com>

x86, trampoline: Common infrastructure for low memory trampolines

Common infrastructure for low memory trampolines. This code installs
the trampolines permanently in low memory very early. It also permits
multiple pieces of code to be used for this purpose.

This code also introduces a standard infrastructure for computing
symbol addresses in the trampoline code.

The only change to the actual SMP trampolines themselves is that the
64-bit trampoline has been made reusable -- the previous version would
overwrite the code with a status variable; this moves the status
variable to a separate location.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <4D5DFBE4.7090104@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Matthieu Castet <castet.matthieu@free.fr>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>


# 645a7919 23-Jan-2011 Tejun Heo <tj@kernel.org>

x86: Unify CPU -> NUMA node mapping between 32 and 64bit

Unlike 64bit, 32bit has been using its own cpu_to_node_map[] for
CPU -> NUMA node mapping. Replace it with early_percpu variable
x86_cpu_to_node_map and share the mapping code with 64bit.

* USE_PERCPU_NUMA_NODE_ID is now enabled for 32bit too.

* x86_cpu_to_node_map and numa_set/clear_node() are moved from
numa_64 to numa. For now, on 32bit, x86_cpu_to_node_map is initialized
with 0 instead of NUMA_NO_NODE. This is to avoid introducing unexpected
behavior change and will be updated once init path is unified.

* srat_detect_node() is now enabled for x86_32 too. It calls
numa_set_node() and initializes the mapping making explicit
cpu_to_node_map[] updates from map/unmap_cpu_to_node() unnecessary.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-15-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>


# 6a108a14 20-Jan-2011 David Rientjes <rientjes@google.com>

kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT

The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.

This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel. A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).

Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.

Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <david.woodhouse@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Greg KH <gregkh@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Robin Holt <holt@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 76d1f7bf 14-Jan-2011 H. Peter Anvin <hpa@linux.intel.com>

x86, olpc: Add missing Kconfig dependencies

OLPC uses select for OLPC_OPENFIRMWARE, which means OLPC has to
enforce the dependencies for OLPC_OPENFIRMWARE. Make sure it does so.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Daniel Drake <dsd@laptop.org>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
LKML-Reference: <20100923162846.D8D409D401B@zog.reactivated.net>
Cc: <stable@kernel.org> 2.6.37


# 64a5fed6 06-Jan-2011 Bjorn Helgaas <bjorn.helgaas@hp.com>

x86/PCI: make Broadcom CNB20LE driver EMBEDDED and EXPERIMENTAL

This functionality is known to be incomplete, so discourage its use in
general-purpose kernels.

The only reason to use this driver is to support PCI hotplug on CNB20LE-
based machines that don't have ACPI, and there are very few such
systems.

Reference: https://bugzilla.redhat.com/show_bug.cgi?id=665109
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 419cdc54 29-Nov-2010 Andres Salomon <dilinger@queued.net>

x86: OLPC: convert olpc-xo1 driver from pci device to platform device

The cs5535-mfd driver now takes care of the PCI BAR handling; this
means the olpc-xo1 driver shouldn't be touching the PCI device at all.

This patch uses both cs5535-acpi and cs5535-pms platform devices rather
than a single platform device because the cs5535-mfd driver may be used
by other CS5535 platform-specific drivers; OLPC doesn't get to dictate
that ACPI and PMS will always be used together.

Signed-off-by: Andres Salomon <dilinger@queued.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>


# 30314804 12-Jan-2011 Lasse Collin <lasse.collin@tukaani.org>

x86: support XZ-compressed kernel

This integrates the XZ decompression code to the x86 pre-boot code.

mkpiggy.c is updated to reserve about 32 KiB more buffer safety margin for
kernel decompression. It is done unconditionally for all decompressors to
keep the code simpler.

The XZ decompressor needs around 30 KiB of heap, so the heap size is
increased to 32 KiB on both x86-32 and x86-64.

Documentation/x86/boot.txt is updated to list the XZ magic number.

With the x86 BCJ filter in XZ, XZ-compressed x86 kernel tends to be a few
percent smaller than the equivalent LZMA-compressed kernel.

Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 351f8f8e 12-Jan-2011 Amerigo Wang <amwang@redhat.com>

kernel: clean up USE_GENERIC_SMP_HELPERS

For arch which needs USE_GENERIC_SMP_HELPERS, it has to select
USE_GENERIC_SMP_HELPERS, rather than leaving a choice to user, since they
don't provide their own implementions.

Also, move on_each_cpu() to kernel/smp.c, it is strange to put it in
kernel/softirq.c.

For arch which doesn't use USE_GENERIC_SMP_HELPERS, e.g. blackfin, only
on_each_cpu() is compiled.

Signed-off-by: Amerigo Wang <amwang@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c10d1e26 16-Nov-2010 Andres Salomon <dilinger@queued.net>

x86, olpc: Add OLPC device-tree support

Make use of PROC_DEVICETREE to export the tree, and sparc's PROMTREE code to
call into OLPC's Open Firmware to build the tree.

v5: fix buglet with root node check (introduced in v4)

v4: address some minor style issues pointed out by Grant, and explicitly cast
negative phandle checks to s32.

v3: rename olpc_prom to olpc_dt
- rework Kconfig entries
- drop devtree build hook from proc, instead adding a call to x86's
paging_init (similarly to how sparc64 does it)
- switch allocation from using slab to alloc_bootmem. this allows
the DT to be built earlier during boot (during setup_arch); the
downside is that there are some 1200 bootmem reservations that are
done during boot. Not ideal..
- add a helper olpc_ofw_is_installed function to test for the
existence and successful detection of OLPC's OFW.

Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20101116220952.26526a80@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# cc2067a5 16-Nov-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

perf, x86: Fixup Kconfig deps

This leads to a Kconfig dep inversion, x86 selects PERF_EVENT (due to
a hw_breakpoint dep) but doesn't unconditionally provide
HAVE_PERF_EVENT.

(This can cause build failures on M386/M486 kernel .config's.)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101117222055.982965150@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# eec1d4fa 29-Oct-2010 Hans Rosenfeld <hans.rosenfeld@amd.com>

x86, amd-nb: Complete the rename of AMD NB and related code

Not only the naming of the files was confusing, it was even more so for
the function and variable names.

Renamed the K8 NB and NUMA stuff that is also used on other AMD
platforms. This also renames the CONFIG_K8_NUMA option to
CONFIG_AMD_NUMA and the related file k8topology_64.c to
amdtopology_64.c. No functional changes intended.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>


# 82148d1d 24-Sep-2010 Shérab <Sebastien.Hinderer@ens-lyon.org>

x86/platform: Add Eurobraille/Iris power off support

The Iris machines from Eurobraille do not have APM or ACPI support
to shut themselves down properly. A special I/O sequence is
needed to do so. This modle runs this I/O sequence at
kernel shutdown when its force parameter is set to 1.

Signed-off-by: Shérab <Sebastien.Hinderer@ens-lyon.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
[ did minor coding style edits ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ad02519a 15-Nov-2010 Randy Dunlap <randy.dunlap@oracle.com>

x86, mrst: Fix dependencies of "select INTEL_SCU_IPC"

commit b9fc71f47 (x86, mrst: The shutdown for MRST requires the SCU
IPC mechanism) introduced the following warning:

warning: (X86_MRST && PCI && PCI_GOANY && X86_32 &&
X86_EXTENDED_PLATFORM && X86_IO_APIC) selects INTEL_SCU_IPC which has
unmet direct dependencies (X86 && X86_PLATFORM_DEVICES && X86_MRST)

which is due to the hierarchical menu structure.

Select X86_PLATFORM_DEVICES as well.

Originally-from: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <20101115101406.77e072ef.randy.dunlap@oracle.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>


# b9fc71f4 15-Nov-2010 Alan Cox <alan@lxorguk.ukuu.org.uk>

x86, mrst: The shutdown for MRST requires the SCU IPC mechanism

Fix the build failure reported by Randy.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
LKML-Reference: <20101115173110.6877.83958.stgit@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 37bc9f50 09-Nov-2010 Dirk Brandewie <dirk.brandewie@gmail.com>

x86: Ce4100: Add reboot_fixup() for CE4100

This patch adds the CE4100 reboot fixup to reboot_fixups_32.c

[ tglx: Moved PCI id to reboot_fixups_32.c ]

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
LKML-Reference: <5bdcfb4f0206fa721570504e95659a03b815bc5e.1289331834.git.dirk.brandewie@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# c751e17b 09-Nov-2010 Thomas Gleixner <tglx@linutronix.de>

x86: Add CE4100 platform support

Add CE4100 platform support. CE4100 needs early setup like
moorestown.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
LKML-Reference: <94720fd7f5564a12ebf202cf2c4f4c0d619aab35.1289331834.git.dirk.brandewie@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 1da4b1c6 09-Nov-2010 Feng Tang <feng.tang@intel.com>

x86/mrst: Add SFI platform device parsing code

SFI provides a series of tables. These describe the platform devices present
including SPI and I²C devices, as well as various sensors, keypads and other
glue as well as interfaces provided via the SCU IPC mechanism (intel_scu_ipc.c)

This patch is a merge of the core elements and relevant fixes from the
Intel development code by Feng, Alek, myself into a single coherent patch
for upstream submission.

It provides the needed infrastructure to register I2C, SPI and platform devices
described by the tables, as well as handlers for some of the hardware already
supported in kernel. The 0.8 firmware also provides GPIO tables.

Devices are created at boot time or if they are SCU dependant at the point an
SCU is discovered. The existing Linux device mechanisms will then handle the
device binding. At an abstract level this is an SFI to Linux device translator.

Device/platform specific setup/glue is in this file. This is done so that the
drivers for the generic I²C and SPI bus devices remain cross platform as they
should.

(Updated from RFC version to correct the emc1403 name used by the firmware
and a wrongly used #define)

Signed-off-by: Alek Du <alek.du@linux.intel.com>
LKML-Reference: <20101109112158.20013.6158.stgit@localhost.localdomain>
[Clean ups, removal of 0.7 support]
Signed-off-by: Feng Tang <feng.tang@linux.intel.com>
[Clean ups]
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 66f2b061 20-Oct-2010 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 || HIGHMEM64G

Set CONFIG_ARCH_DMA_ADDR_T_64BIT when we set dma_addr_t to 64 bits in
<asm/types.h>; this allows Kconfig decisions based on this property.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <201010202255.o9KMtZXu009370@imap1.linux-foundation.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# e82b8e4e 04-Oct-2010 Venkatesh Pallipadi <venki@google.com>

x86: Add IRQ_TIME_ACCOUNTING

This patch adds IRQ_TIME_ACCOUNTING option on x86 and runtime enables it
when TSC is enabled.

This change just enables fine grained irq time accounting, isn't used yet.
Following patches use it for different purposes.

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1286237003-12406-6-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e360adbe 14-Oct-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

irq_work: Add generic hardirq context callbacks

Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.

Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.

The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.

Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[ various fixes ]
Signed-off-by: Huang Ying <ying.huang@intel.com>
LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b5401a96 18-Mar-2010 Alex Nixon <alex.nixon@citrix.com>

xen/x86/PCI: Add support for the Xen PCI subsystem

The frontend stub lives in arch/x86/pci/xen.c, alongside other
sub-arch PCI init code (e.g. olpc.c).

It provides a mechanism for Xen PCI frontend to setup/destroy
legacy interrupts, MSI/MSI-X, and PCI configuration operations.

[ Impact: add core of Xen PCI support ]
[ v2: Removed the IOMMU code and only focusing on PCI.]
[ v3: removed usage of pci_scan_all_fns as that does not exist]
[ v4: introduced pci_xen value to fix compile warnings]
[ v5: squished fixes+features in one patch, changed Reviewed-by to Ccs]
[ v7: added Acked-by]
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Qing He <qing.he@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org


# 80e7b19a 23-Sep-2010 Daniel Drake <dsd@laptop.org>

PCI: OLPC: Only enable PCI configuration type override on XO-1

This configuration type override is for XO-1 only and must not happen
on XO-1.5.

Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 9e9006e9 14-Oct-2010 Randy Dunlap <randy.dunlap@oracle.com>

x86, olpc: XO-1 uses/depends on PCI

olpc-xo1 uses pci_*() interfaces so it should depend on PCI.

Otherwise we get build failure like:

arch/x86/kernel/olpc-xo1.c:65: error: implicit declaration of function 'pci_enable_device_io'
arch/x86/kernel/olpc-xo1.c:71: error: implicit declaration of function 'pci_request_region'
arch/x86/kernel/olpc-xo1.c:80: error: implicit declaration of function 'pci_release_region'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Daniel Drake <dsd@laptop.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20101014101313.adf7eb2a.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# cf4db259 14-Oct-2010 Steven Rostedt <srostedt@redhat.com>

ftrace: Rename config option HAVE_C_MCOUNT_RECORD to HAVE_C_RECORDMCOUNT

The config option used by archs to let the build system know that
the C version of the recordmcount works for said arch is currently
called HAVE_C_MCOUNT_RECORD which enables BUILD_C_RECORDMCOUNT. To
be more consistent with the name that all archs may use, it has been
renamed to HAVE_C_RECORDMCOUNT. This will be less confusing since
we are building a C recordmcount and not a mcount_record.

Suggested-by: Ingo Molnar <mingo@elte.hu>
Cc: <linux-arch@vger.kernel.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: linux-kbuild@vger.kernel.org
Cc: John Reiser <jreiser@bitwagon.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 72441cb1 13-Oct-2010 Steven Rostedt <srostedt@redhat.com>

ftrace/x86: Add support for C version of recordmcount

This patch adds the support for the C version of recordmcount and
compile times show ~ 12% improvement.

After verifying this works, other archs can add:

HAVE_C_MCOUNT_RECORD

in its Kconfig and it will use the C version of recordmcount
instead of the perl version.

Cc: <linux-arch@vger.kernel.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: linux-kbuild@vger.kernel.org
Cc: John Reiser <jreiser@bitwagon.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 3cba11d3 13-Oct-2010 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

kconfig/x86: Add HAVE_TEXT_POKE_SMP config for stop_machine dependency

Since the text_poke_smp() definately depends on actual
stop_machine() on smp, add that dependency to Kconfig.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
LKML-Reference: <20101014031042.4100.90877.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 03f1a17c 13-Oct-2010 Randy Dunlap <randy.dunlap@oracle.com>

x86/vsmp: Eliminate kconfig dependency warning

Fix kconfig dependency warning to satisfy dependencies:

warning: (X86_VSMP && X86_64 && PCI && X86_EXTENDED_PLATFORM ||
XEN && PARAVIRT_GUEST && (X86_64 || X86_32 && X86_PAE && !X86_VISWS) && X86_CMPXCHG && X86_TSC || KVM_CLOCK && PARAVIRT_GUEST || KVM_GUEST && PARAVIRT_GUEST || LGUEST_GUEST && PARAVIRT_GUEST && X86_32) selects PARAVIRT which has unmet direct dependencies (PARAVIRT_GUEST)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
LKML-Reference: <20101013210023.9a033222.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# bf1ebf00 10-Oct-2010 Daniel Drake <dsd@laptop.org>

x86, olpc: Add XO-1 poweroff support

Add a pm_power_off handler for the OLPC XO-1 laptop.

The driver can be built modular and follows the behaviour of the
APM driver, setting pm_power_off to NULL on unload. However, the
ability to unload the module will probably be removed (with a simple
__module_get(THIS_MODULE)) if/when XO-1 suspend/resume support is
added to this file at a later date.

Signed-off-by: Daniel Drake <dsd@laptop.org>
LKML-Reference: <20101010094032.9AE669D401B@zog.reactivated.net>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# fbc6bff0 28-Sep-2010 Thomas Gleixner <tglx@linutronix.de>

x86: ioapic: Cleanup sparse irq code

Switch over to the new allocator and remove all the magic which was
caused by the unability to destroy irq descriptors. Get rid of the
create_irq_nr() loop for sparse and non sparse irq.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>


# 3bb9808e 26-Sep-2010 Thomas Gleixner <tglx@linutronix.de>

x86: Use genirq Kconfig

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20100927121843.314600915@linutronix.de>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>


# 3e3c4860 23-Sep-2010 Daniel Drake <dsd@laptop.org>

x86, olpc: Rework BIOS signature check

The XO-1.5 laptop is not currently detected as an OLPC machine because
it fails this XO-1-centric check.

Now that we have OLPC OFW support in the kernel, a more sensible
check is to see if we found OFW during boot and check the architecture
property.

Also remove a now-meaningless codepath, as we're always going to have
OFW support with OLPC.

Signed-off-by: Daniel Drake <dsd@laptop.org>
LKML-Reference: <20100923162846.D8D409D401B@zog.reactivated.net>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 76fb6570 23-Sep-2010 Daniel Drake <dsd@laptop.org>

x86, olpc: Only enable PCI configuration type override on XO-1

This configuration type override is for XO-1 only and must not happen
on XO-1.5.

Signed-off-by: Daniel Drake <dsd@laptop.org>
LKML-Reference: <20100923162805.0F6549D401B@zog.reactivated.net>
Cc: Andres Solomon <dilinger@queued.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 46eb3b64 22-Sep-2010 Steven Rostedt <srostedt@redhat.com>

jump label/x86/sparc64: Remove !CC_OPTIMIZE_FOR_SIZE config conditions

The !CC_OPTIMIZE_FOR_SIZE was added to enable the jump label functionality
because Jason noticed that the gcc option would not optimize the labels
and may even hurt performance.

But this is a gcc problem not a kernel one. Removing this condition should
add motivation to the gcc developers to actually fix it.

Cc: Jason Baron <jbaron@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# d9f5ab7b 17-Sep-2010 Jason Baron <jbaron@redhat.com>

jump label: x86 support

add x86 support for jump label. I'm keeping this patch separate so its clear
to arch maintainers what was required for x86 support this new feature.
Hopefully, it wouldn't be too painful for other archs.

Signed-off-by: Jason Baron <jbaron@redhat.com>
LKML-Reference: <f838f49f40fbea0254036194be66dc48b598dcea.1284733808.git.jbaron@redhat.com>

[ cleaned up some formatting ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 23ac4ae8 17-Sep-2010 Andreas Herrmann <andreas.herrmann3@amd.com>

x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB

The file names are somehow misleading as the code is not specific to
AMD K8 CPUs anymore. The files accomodate code for other AMD CPU
northbridges as well.

Same is true for the config option which is valid for AMD CPU
northbridges in general and not specific to K8.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160343.GD4958@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 838a2e55 04-Sep-2010 Arnaud Lacombe <lacombar@gmail.com>

kbuild: migrate all arch to the kconfig mainmenu upgrade

Signed-off-by: Arnaud Lacombe <lacombar@gmail.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Michal Marek <mmarek@suse.cz>


# 774ea0bc 25-Aug-2010 Yinghai Lu <yinghai@kernel.org>

x86: Remove old bootmem code

Requested by Ingo, Thomas and HPA.

The old bootmem code is no longer necessary, and the transition is
complete. Remove it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 72d7c3b3 25-Aug-2010 Yinghai Lu <yinghai@kernel.org>

x86: Use memblock to replace early_res

1. replace find_e820_area with memblock_find_in_range
2. replace reserve_early with memblock_x86_reserve_range
3. replace free_early with memblock_x86_free_range.
4. NO_BOOTMEM will switch to use memblock too.
5. use _e820, _early wrap in the patch, in following patch, will
replace them all
6. because memblock_x86_free_range support partial free, we can remove some special care
7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill()
so adjust some calling later in setup.c::setup_arch()
-- corruption_check and mptable_update

-v2: Move reserve_brk() early
Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range()
that could happen We have more then 128 RAM entry in E820 tables, and
memblock_x86_fill() could use memblock_find_in_range() to find a new place for
memblock.memory.region array.
and We don't need to use extend_brk() after fill_memblock_area()
So move reserve_brk() early before fill_memblock_area().
-v3: Move find_smp_config early
To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable
in right place.
-v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in
memblock.reserved already..
use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later.
-v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit
active_region for 32bit does include high pages
need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped()
-v6: Use current_limit instead
-v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L
-v8: Set memblock_can_resize early to handle EFI with more RAM entries
-v9: update after kmemleak changes in mainline

Suggested-by: David S. Miller <davem@davemloft.net>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 9ea77bdb 25-Aug-2010 H. Peter Anvin <hpa@linux.intel.com>

x86, bios: Make the x86 early memory reservation a kernel option

Add a kernel command-line option so the x86 early memory reservation
size can be adjusted at runtime instead of only at compile time.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <tip-d0cd7425fab774a480cce17c2f649984312d0b55@git.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# d0cd7425 24-Aug-2010 H. Peter Anvin <hpa@linux.intel.com>

x86, bios: By default, reserve the low 64K for all BIOSes

The laundry list of BIOSes that need the low 64K reserved is getting
very long, so make it the default across all BIOSes. This also allows
the code to be simplified and unified with the reservation code for
the first 4K.

This resolves kernel bugzilla 16661 and who knows what else...

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <tip-*@git.kernel.org>


# 9863c90f 23-Aug-2010 Alok Kataria <akataria@vmware.com>

x86, vmware: Remove deprecated VMI kernel support

With the recent innovations in CPU hardware acceleration technologies
from Intel and AMD, VMware ran a few experiments to compare these
techniques to guest paravirtualization technique on VMware's platform.
These hardware assisted virtualization techniques have outperformed the
performance benefits provided by VMI in most of the workloads. VMware
expects that these hardware features will be ubiquitous in a couple of
years, as a result, VMware has started a phased retirement of this
feature from the hypervisor.

Please note that VMI has always been an optimization and non-VMI kernels
still work fine on VMware's platform.
Latest versions of VMware's product which support VMI are,
Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence
releases for these products will continue supporting VMI.

For more details about VMI retirement take a look at this,
http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html

This feature removal was scheduled for 2.6.37 back in September 2009.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <1282600151.19396.22.camel@ank32.eng.vmware.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# ddb0c5a6 21-Aug-2010 Samuel Thibault <samuel.thibault@ens-lyon.org>

Replace Configure with Enable in description of MAXSMP

The "Configure" word tends to make user believe they have to say 'yes'
to be able to choose the number of procs/nodes. "Enable" should be
unambiguous enough.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d7c53c9e 19-Aug-2010 Borislav Petkov <bp@amd64.org>

x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues

When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d:
Stuck ??" message due to multiple cores concurrently accessing the
cpu_callin_mask, among others.

Since these codepaths are not protected from concurrent access due to
the fact that there's no sane reason for making an already complex
code unnecessarily more complex - we hit the issue only when insanely
switching cores off- and online - serialize hotplugging cores on the
sysfs level and be done with it.

[ v2.1: fix !HOTPLUG_CPU build ]

Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100819181029.GC17171@aftab>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 592913ec 13-Jul-2010 John Stultz <johnstul@us.ibm.com>

time: Kill off CONFIG_GENERIC_TIME

Now that all arches have been converted over to use generic time via
clocksources or arch_gettimeoffset(), we can remove the GENERIC_TIME
config option and simplify the generic code.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-4-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# fd699c76 18-Jun-2010 Andres Salomon <dilinger@queued.net>

x86, olpc: Add support for calling into OpenFirmware

Add support for saving OFW's cif, and later calling into it to run OFW
commands. OFW remains resident in memory, living within virtual range
0xff800000 - 0xffc00000. A single page directory entry points to the
pgdir that OFW actually uses, so rather than saving the entire page
table, we grab and install that one entry permanently in the kernel's
page table.

This is currently only used by the OLPC XO. Note that this particular
calling convention breaks PAE and PAT, and so cannot be used on newer
x86 hardware.

Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20100618174653.7755a39a@dev.queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# e534c7c5 26-May-2010 Lee Schermerhorn <lee.schermerhorn@hp.com>

numa: x86_64: use generic percpu var numa_node_id() implementation

x86 arch specific changes to use generic numa_node_id() based on generic
percpu variable infrastructure. Back out x86's custom version of
numa_node_id()

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4a14d84e 26-May-2010 Andrew Morton <akpm@linux-foundation.org>

x86_32: use asm-generic/scatterlist.h

Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 18e98307 26-May-2010 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

asm-generic: add NEED_SG_DMA_LENGTH to define sg_dma_len()

There are only two ways to define sg_dma_len(); use sg->dma_length or
sg->length. This patch introduces NEED_SG_DMA_LENGTH that enables
architectures to choose sg->dma_length or sg->length.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3f6ea84a 01-Apr-2010 Ira W. Snyder <iws@ovro.caltech.edu>

PCI: read memory ranges out of Broadcom CNB20LE host bridge

Read the memory ranges behind the Broadcom CNB20LE host bridge out of the
hardware. This allows PCI hotplugging to work, since we know which memory
range to allocate PCI BAR's from.

The x86 PCI code automatically prefers the ACPI _CRS information when it is
available. In that case, this information is not used.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# c01d4323 15-May-2010 Frederic Weisbecker <fweisbec@gmail.com>

lockup_detector: Adapt CONFIG_PERF_EVENT_NMI to other archs

CONFIG_PERF_EVENT_NMI is something that need to be enabled from the
arch. This is fine on x86 as PERF_EVENTS is builtin but if other
archs select it, they will need to handle the PERF_EVENTS dependency.

Instead, handle the dependency in the generic layer:

- archs need to tell what they support through HAVE_PERF_EVENTS_NMI
- Enable magically PERF_EVENTS_NMI if we have PERF_EVENTS and
HAVE_PERF_EVENTS_NMI.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>


# 0102752e 11-Apr-2010 Frederic Weisbecker <fweisbec@gmail.com>

hw-breakpoints: Separate constraint space for data and instruction breakpoints

There are two outstanding fashions for archs to implement hardware
breakpoints.

The first is to separate breakpoint address pattern definition
space between data and instruction breakpoints. We then have
typically distinct instruction address breakpoint registers
and data address breakpoint registers, delivered with
separate control registers for data and instruction breakpoints
as well. This is the case of PowerPc and ARM for example.

The second consists in having merged breakpoint address space
definition between data and instruction breakpoint. Address
registers can host either instruction or data address and
the access mode for the breakpoint is defined in a control
register. This is the case of x86 and Super H.

This patch adds a new CONFIG_HAVE_MIXED_BREAKPOINTS_REGS config
that archs can select if they belong to the second case. Those
will have their slot allocation merged for instructions and
data breakpoints.

The others will have a separate slot tracking between data and
instruction breakpoints.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>


# 6fc108a0 21-Apr-2010 Jan Beulich <JBeulich@novell.com>

x86: Clean up arch/x86/Kconfig*

No functional change intended.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF2690020000780003B340@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# d61931d8 05-Mar-2010 Borislav Petkov <borislav.petkov@amd.com>

x86: Add optimized popcnt variants

Add support for the hardware version of the Hamming weight function,
popcnt, present in CPUs which advertize it under CPUID, Function
0x0000_0001_ECX[23]. On CPUs which don't support it, we fallback to the
default lib/hweight.c sw versions.

A synthetic benchmark comparing popcnt with __sw_hweight64 showed almost
a 3x speedup on a F10h machine.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100318112015.GC11152@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 51591e31 25-Mar-2010 David Rientjes <rientjes@google.com>

x86: Increase CONFIG_NODES_SHIFT max to 10

Some larger systems require more than 512 nodes, so increase the
maximum CONFIG_NODES_SHIFT to 10 for a new max of 1024 nodes.

This was tested with numa=fake=64M on systems with more than
64GB of RAM. A total of 1022 nodes were initialized.

Successfully builds with no additional warnings on x86_64
allyesconfig.

( No effect on any existing config. Newly enabled CONFIG_MAXSMP=y
will see the new default. )

Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1003251538060.8589@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0e152cd7 12-Mar-2010 Borislav Petkov <bp@amd64.org>

x86, k8 nb: Fix boot crash: enable k8_northbridges unconditionally on AMD systems

de957628ce7c84764ff41331111036b3ae5bad0f changed setting of the
x86_init.iommu.iommu_init function ptr only when GART IOMMU is
found.

One side effect of it is that num_k8_northbridges
is not initialized anymore if not explicitly
called. This resulted in uninitialized pointers in
<arch/x86/kernel/cpu/intel_cacheinfo.c:amd_calc_l3_indices()>,
for example, which uses the num_k8_northbridges thing through
node_to_k8_nb_misc().

Fix that through an initcall that runs right after the PCI
subsystem and does all the scanning. Then, remove initialization
in gart_iommu_init() which is a rootfs_initcall and we're
running before that.

What is more, since num_k8_northbridges is being used in other
places beside GART IOMMU, include it whenever we add AMD CPU
support. The previous dependency chain in kconfig contained

K8_NB depends on AGP_AMD64|GART_IOMMU

which was clearly incorrect. The more natural way in terms of
hardware dependency should be

AGP_AMD64|GART_IOMMU depends on K8_NB depends on CPU_SUP_AMD &&
PCI. Make it so Number One!

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <20100312144303.GA29262@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Joerg Roedel <joerg.roedel@amd.com>


# 3bc4e459 10-Mar-2010 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

pci-dma: x86: use include/linux/pci-dma.h

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ba7e4d13 06-Jun-2009 Ingo Molnar <mingo@elte.hu>

perf, x86: Add INSTRUCTION_DECODER config flag

The PEBS+LBR decoding magic needs the insn_get_length() infrastructure
to be able to decode x86 instruction length.

So split it out of KPROBES dependency and make it enabled when either
KPROBES or PERF_EVENTS is enabled.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# d8111cd9 02-Mar-2010 Jacob Pan <jacob.jun.pan@linux.intel.com>

x86, mrst: Remove X86_MRST dependency on PCI_IOAPIC

PCI_IOAPIC is used for PCI hotplug, Moorestown does not have ACPI PCI
hotplug, as it does not have ACPI. This unnecessary dependency causes
X86_MRST fail to be selected if ACPI is not selected.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1267550368-7435-1-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 4b2f3f7d 25-Feb-2010 Jacob Pan <jacob.jun.pan@linux.intel.com>

x86, mrst: Add Kconfig dependencies for Moorestown

The Moorestown platform requires IOAPIC for all interrupts from the
south complex, since there is no legacy PIC.

Furthermore, Moorestown I/O requires PCI. Moorestown PCI depends on PCI MMCONFIG
and DIRECT method to perform device enumeration, as there is no PCI BIOS.

[ hpa: rewrote commit message ]

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1267120934-9505-1-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# a92d152e 24-Feb-2010 Pan, Jacob jun <jacob.jun.pan@intel.com>

x86, numaq: Make CONFIG_X86_NUMAQ depend on CONFIG_PCI

The NUMAQ initialization sets x86_init.pci.init to pci_numaq_init,
which obviously isn't defined if CONFIG_PCI isn't defined. This
dependency was implicit in the past, because pci_numaq_init was
invoked from arch/x86/pci/legacy.c, which itself was conditioned on
CONFIG_PCI.

I suspect that no NUMA-Q machines without PCI were ever built, so
instead of complicating the code by adding #ifdefs or stub functions,
just disable this bit of the configuration space.

[ hpa: rewrote the checkin comment ]

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A321EE1F@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# c0f7ac3a 25-Feb-2010 Masami Hiramatsu <mhiramat@redhat.com>

kprobes/x86: Support kprobes jump optimization on x86

Introduce x86 arch-specific optimization code, which supports
both of x86-32 and x86-64.

This code also supports safety checking, which decodes whole of
a function in which probe is inserted, and checks following
conditions before optimization:
- The optimized instructions which will be replaced by a jump instruction
don't straddle the function boundary.
- There is no indirect jump instruction, because it will jumps into
the address range which is replaced by jump operand.
- There is no jump/loop instruction which jumps into the address range
which is replaced by jump operand.
- Don't optimize kprobes if it is in functions into which fixup code will
jumps.

This uses text_poke_multibyte() which doesn't support modifying
code on NMI/MCE handler. However, since kprobes itself doesn't
support NMI/MCE code probing, it's not a problem.

Changes in v9:
- Use *_text_reserved() for checking the probe can be optimized.
- Verify jump address range is in 2G range when preparing slot.
- Backup original code when switching optimized buffer, instead of
preparing buffer, because there can be int3 of other probes in
preparing phase.
- Check kprobe is disabled in arch_check_optimized_kprobe().
- Strictly check indirect jump opcodes (ff /4, ff /5).

Changes in v6:
- Split stop_machine-based jump patching code.
- Update comments and coding style.

Changes in v5:
- Introduce stop_machine-based jump replacing.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Anders Kaseorg <andersk@ksplice.com>
Cc: Tim Abbott <tabbott@ksplice.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
LKML-Reference: <20100225133446.6725.78994.stgit@localhost6.localdomain6>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# bb24c471 02-Sep-2009 Jacob Pan <jacob.jun.pan@intel.com>

x86, apbt: Moorestown APB system timer driver

Moorestown platform does not have PIT or HPET platform timers. Instead it
has a bank of eight APB timers. The number of available timers to the os
is exposed via SFI mtmr tables. All APB timer interrupts are routed via
ioapic rtes and delivered as MSI.
Currently, we use timer 0 and 1 for per cpu clockevent devices, timer 2
for clocksource.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318D2D2@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# f850c30c 10-Feb-2010 Heiko Carstens <hca@linux.ibm.com>

tracing/kprobes: Make Kconfig dependencies generic

KPROBES_EVENT actually depends on the regs and stack access API
(b1cf540f) and not on x86.
So introduce a new config option which architectures can select if
they have the API implemented and switch x86.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
LKML-Reference: <20100210162517.GB6933@osiris.boeblingen.de.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>


# 580e0ad2 16-Feb-2010 Yinghai Lu <yinghai@kernel.org>

core: Move early_res from arch/x86 to kernel/

This makes the range reservation feature available to other
architectures.

-v2: add get_max_mapped, max_pfn_mapped only defined in x86...
to fix PPC compiling
-v3: according to hpa, add CONFIG_HAVE_EARLY_RES
-v4: fix typo about EARLY_RES in config

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B7B5723.4070009@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# c3128fb6 12-Feb-2010 Don Zickus <dzickus@redhat.com>

nmi_watchdog: Use a boolean config flag for compiling

Determines if an arch has setup arch specific perf_events and
nmi_watchdog code. This should restrict compiles to only those
arches ready.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: peterz@infradead.org
Cc: gorcunov@gmail.com
Cc: aris@redhat.com
LKML-Reference: <1266013161-31197-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 59be5a8e 10-Feb-2010 Yinghai Lu <yinghai@kernel.org>

x86: Make 32bit support NO_BOOTMEM

Let's make 32bit consistent with 64bit.

-v2: Andrew pointed out for 32bit that we should use -1ULL

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-25-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 08677214 10-Feb-2010 Yinghai Lu <yinghai@kernel.org>

x86: Make 64 bit use early_res instead of bootmem before slab

Finally we can use early_res to replace bootmem for x86_64 now.

Still can use CONFIG_NO_BOOTMEM to enable it or not.

-v2: fix 32bit compiling about MAX_DMA32_PFN
-v3: folded bug fix from LKML message below

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B747239.4070907@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# b1600918 23-Jan-2010 H. Peter Anvin <hpa@zytor.com>

x86: Remove "x86 CPU features in debugfs" (CONFIG_X86_CPU_DEBUG)

CONFIG_X86_CPU_DEBUG, which provides some parsed versions of the x86
CPU configuration via debugfs, has caused boot failures on real
hardware. The value of this feature has been marginal at best, as all
this information is already available to userspace via generic
interfaces.

Causes crashes that have not been fixed + minimal utility -> remove.

See the referenced LKML thread for more information.

Reported-by: Ozan Çağlayan <ozan@pardus.org.tr>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <alpine.LFD.2.00.1001221755320.13231@localhost.localdomain>
Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@kernel.org>


# a29815a3 10-Jan-2010 Avi Kivity <avi@qumranet.com>

core, x86: make LIST_POISON less deadly

The list macros use LIST_POISON1 and LIST_POISON2 as undereferencable
pointers in order to trap erronous use of freed list_heads. Unfortunately
userspace can arrange for those pointers to actually be dereferencable,
potentially turning an oops to an expolit.

To avoid this allow architectures (currently x86_64 only) to override
the default values for these pointers with truly-undereferencable values.
This is easy on x86_64 as the virtual address space is large and contains
areas that cannot be mapped.

Other 64-bit architectures will likely find similar unmapped ranges.

[ingo: switch to 0xdead000000000000 as the unmapped area]
[ingo: add comments, cleanup]
[jaswinder: eliminate sparse warnings]

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 13510997 08-Jan-2010 Albin Tonnerre <albin.tonnerre@free-electrons.com>

x86: add support for LZO-compressed kernels

The necessary changes to the x86 Kconfig and boot/compressed to allow the
use of this new compression method

Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Tested-by: Wu Zhangjin <wuzhangjin@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Russell King <rmk@arm.linux.org.uk>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 99e8c5a3 16-Dec-2009 Frederic Weisbecker <fweisbec@gmail.com>

hw-breakpoints: Fix hardware breakpoints -> perf events dependency

The kbuild's select command doesn't propagate through the config
dependencies.

Hence the current rules of hardware breakpoint's config can't
ensure perf can never be disabled under us.

We have:

config X86
selects HAVE_HW_BREAKPOINTS

config HAVE_HW_BREAKPOINTS
select PERF_EVENTS

config PERF_EVENTS
[...]

x86 will select the breakpoints but that won't propagate to perf
events. The user can still disable the latter, but it is
necessary for the breakpoints.

What we need is:

- x86 selects HAVE_HW_BREAKPOINTS and PERF_EVENTS
- HAVE_HW_BREAKPOINTS depends on PERF_EVENTS

so that we ensure PERF_EVENTS is enabled and frozen for x86.

This fixes the following kind of build errors:

In file included from arch/x86/kernel/hw_breakpoint.c:31:
include/linux/hw_breakpoint.h: In function 'hw_breakpoint_addr':
include/linux/hw_breakpoint.h:39: error: 'struct perf_event' has no member named 'attr'

v2: Select also ANON_INODES from x86, required for perf

Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Michal Marek <mmarek@suse.cz>
Reported-by: Andrew Randrianasulu <randrik_a@yahoo.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <1261010034-7786-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c95d1e53 14-Dec-2009 Andres Salomon <dilinger@collabora.co.uk>

cs5535: drop the Geode-specific MFGPT/GPIO code

With generic modular drivers handling all of this stuff, the
geode-specific code can go away. The cs5535-gpio, cs5535-mfgpt, and
cs5535-clockevt drivers now handle this.

Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3c554946 14-Dec-2009 Andres Salomon <dilinger@collabora.co.uk>

ALSA: cs5535audio: free OLPC quirks from reliance on MGEODE_LX cpu optimization

Previously, OLPC support for the mic extensions was only enabled in the
ALSA driver if CONFIG_OLPC and CONFIG_MGEODE_LX were both set. This was
because the old geode GPIO code was written in a manner that assumed
CONFIG_MGEODE_LX. With the new cs553x-gpio driver, this is no longer the
case; as such, we can drop the requirement on CONFIG_MGEODE_LX and instead
include a requirement on GPIOLIB.

We use the generic GPIO API rather than the cs553x-specific API.

Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e84446de 10-Nov-2009 Randy Dunlap <randy.dunlap@oracle.com>

x86 VSDO: Fix Kconfig help

COMPAT_VDSO has 2 help text blocks, but kconfig only uses the
last one found, so merge the 2 blocks.

It would be real nice if kconfig would warn about this.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <4AF9FB6C.70003@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 14a3f40a 23-Oct-2009 Arjan van de Ven <arjan@infradead.org>

x86: Remove STACKPROTECTOR_ALL

STACKPROTECTOR_ALL has a really high overhead (runtime and stack
footprint) and is not really worth it protection wise (the
normal STACKPROTECTOR is in effect for all functions with
buffers already), so lets just remove the option entirely.

Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Eric Sandeen <sandeen@redhat.com>
LKML-Reference: <20091023073101.3dce4ebb@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c03cb314 11-Oct-2009 Arjan van de Ven <arjan@infradead.org>

x86: Relegate CONFIG_PAT and CONFIG_MTRR configurability to EMBEDDED

MTRR and PAT support (which got added to CPUs over 10 years ago)
are no longer really optional in that more and more things are
depending on PAT just working, including various drivers and newer
versions of X. (to not even speak of MTRR)

Having this as a regular config option just no longer makes sense.

This patch relegates CONFIG_X86_PAT to the EMBEDDED category so
ultra-embedded can still disable it if they really need to.

Also-Suggested-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
LKML-Reference: <20091011103302.62bded41@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# d0153ca3 29-Sep-2009 Alok Kataria <akataria@vmware.com>

x86, vmi: Mark VMI deprecated and schedule it for removal

Add text in feature-removal.txt indicating that VMI will be removed in
the 2.6.37 timeframe.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
LKML-Reference: <1254193238.13456.48.camel@ank32.eng.vmware.com>
[ removed a bogus Kconfig change, marked (DEPRECATED) in Kconfig ]
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 7c68af6e 19-Sep-2009 Avi Kivity <avi@redhat.com>

core, x86: Add user return notifiers

Add a general per-cpu notifier that is called whenever the kernel is
about to return to userspace. The notifier uses a thread_info flag
and existing checks, so there is no impact on user return or context
switch fast paths.

This will be used initially to speed up KVM task switching by lazily
updating MSRs.

Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1253342422-13811-1-git-send-email-avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 4701472e 26-Sep-2009 Jaswinder Singh Rajput <jaswinder@kernel.org>

x86, SLUB: Remove unused CONFIG FAST_CMPXCHG_LOCAL

Remove unused CONFIG FAST_CMPXCHG_LOCAL from Kconfig.

Reported-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Robert P. J. Day" <rpjday@crashcourse.ca>
Cc: linux-mm@kvack.org
LKML-Reference: <1253981501.4568.61.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# d949f36f 26-Sep-2009 Linus Torvalds <torvalds@linux-foundation.org>

x86: Fix hwpoison code related build failure on 32-bit NUMAQ

This build failure triggers:

In file included from include/linux/suspend.h:8,
from arch/x86/kernel/asm-offsets_32.c:11,
from arch/x86/kernel/asm-offsets.c:2:
include/linux/mm.h:503:2: error: #error SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS

Because due to the hwpoison page flag we ran out of page
flags on 32-bit.

Dont turn on hwpoison on 32-bit NUMA (it's rare in any
case).

Also clean up the Kconfig dependencies in the generic MM
code by introducing ARCH_SUPPORTS_MEMORY_FAILURE.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9492587c 22-Sep-2009 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

kcore: register text area in generic way

Some 64bit arch has special segment for mapping kernel text. It should be
entried to /proc/kcore in addtion to direct-linear-map, vmalloc area.
This patch unifies KCORE_TEXT entry scattered under x86 and ia64.

I'm not familiar with other archs (mips has its own even after this patch)
but range of [_stext ..._end) is a valid area of text and it's not in
direct-map area, defining CONFIG_ARCH_PROC_KCORE_TEXT is only a necessary
thing to do.

Note: I left mips as it is now.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cdd6c482 20-Sep-2009 Ingo Molnar <mingo@elte.hu>

perf: Do the big rename: Performance Counters -> Performance Events

Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES

for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done

FILES=$(find . -name perf_event.*)

sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )

Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0c02a20f 19-Sep-2009 David Woodhouse <David.Woodhouse@intel.com>

intel-iommu: Kill DMAR_BROKEN_GFX_WA option.

Just make it depend on BROKEN for now, in case people scream really loud
about it (and because we might want to keep some of this logic for an
upcoming BIOS workaround, so I don't just want to rip it out entirely
just yet). But for graphics devices, it really ought to be unnecessary.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>


# 6ac162d6 03-Sep-2009 Pavel Vasilyev <pavel@pavlinux.ru>

x86/gart: Do not select AGP for GART_IOMMU

There is no dependency from the gart code to the agp code.
And since a lot of systems today do not have agp anymore
remove this dependency from the kernel configuration.

Signed-off-by: Pavel Vasilyev <pavel@pavlinux.ru>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>


# 69575d38 01-Sep-2009 Shane Wang <shane.wang@intel.com>

x86, intel_txt: clean up the impact on generic code, unbreak non-x86

Move tboot.h from asm to linux to fix the build errors of intel_txt
patch on non-X86 platforms. Remove the tboot code from generic code
init/main.c and kernel/cpu.c.

Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 3f4110a4 29-Aug-2009 Thomas Gleixner <tglx@linutronix.de>

x86: Add Moorestown early detection

Moorestown MID devices need to be detected early in the boot process
to setup and do not call x86_default_early_setup as there is no EBDA
region to reserve.

[ Copied the minimal code from Jacobs latest MRST series ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>


# 5f0db7a2 14-Aug-2009 Feng Tang <feng.tang@intel.com>

SFI: Hook PCI MMCONFIG

First check ACPI, and if that fails, ask SFI to find the MCFG.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>


# efafc8b2 14-Aug-2009 Feng Tang <feng.tang@intel.com>

x86: add arch-specific SFI support

arch/x86/kernel/sfi.c serves the dual-purpose of supporting the
SFI core with arch specific code, as well as a home for the
arch-specific code that uses SFI.

analogous to ACPI, drivers/sfi/Kconfig is pulled in by arch/x86/Kconfig

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org


# 46cf98cd 10-Jul-2009 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

x86, pat: Generalize the use of page flag PG_uncached

Only IA64 was using PG_uncached as of now. We now intend to use this bit
in x86 as well, to keep track of memory type of those addresses that
have page struct for them. So, generalize the use of that bit across
ia64 and x86.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 66700001 24-Aug-2009 Josh Stone <jistone@redhat.com>

tracing: Rename FTRACE_SYSCALLS for tracepoints

s/HAVE_FTRACE_SYSCALLS/HAVE_SYSCALL_TRACEPOINTS/g
s/TIF_SYSCALL_FTRACE/TIF_SYSCALL_TRACEPOINT/g

The syscall enter/exit tracing is no longer specific to just ftrace, so
they now have names that reflect their tie to tracepoints instead.

Signed-off-by: Josh Stone <jistone@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <1251150194-1713-2-git-send-email-jistone@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>


# 4518e6a0 14-Aug-2009 Tejun Heo <tj@kernel.org>

x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA

Embedding percpu first chunk allocator can now handle very sparse unit
mapping. Use embedding allocator instead of lpage for 64bit NUMA.
This removes extra TLB pressure and the need to do complex and fragile
dancing when changing page attributes.

For 32bit, using very sparse unit mapping isn't a good idea because
the vmalloc space is very constrained. 32bit NUMA machines aren't
exactly the focus of optimization and it isn't very clear whether
lpage performs better than page. Use page first chunk allocator for
32bit NUMAs.

As this leaves setup_pcpu_*() functions pretty much empty, fold them
into setup_per_cpu_areas().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <andi@firstfloor.org>


# 08fc4580 14-Aug-2009 Tejun Heo <tj@kernel.org>

percpu: build first chunk allocators selectively

There's no need to build unused first chunk allocators in. Define
CONFIG_NEED_PER_CPU_*_FIRST_CHUNK and let archs enable them
selectively.

Signed-off-by: Tejun Heo <tj@kernel.org>


# 04da8a43 11-Aug-2009 Ingo Molnar <mingo@elte.hu>

perf_counter, x86: Fix/improve apic fallback

Johannes Stezenbach reported that his Pentium-M based
laptop does not have the local APIC enabled by default,
and hence perfcounters do not get initialized.

Add a fallback for this case: allow non-sampled counters
and return with an error on sampled counters. This allows
'perf stat' to work out of box - and allows 'perf top'
and 'perf record' to fall back on a hrtimer based sampling
method.

( Passing 'lapic' on the boot line will allow hardware
sampling to occur - but if the APIC is disabled
permanently by the hardware then this fallback still
allows more systems to use perfcounters. )

Also decouple perfcounter support from X86_LOCAL_APIC.

-v2: fix typo breaking counters on all other systems ...

Reported-by: Johannes Stezenbach <js@sig21.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c1ebf835 08-Jul-2009 Andi Kleen <andi@firstfloor.org>

x86: mce: Rename CONFIG_X86_NEW_MCE to CONFIG_X86_MCE

Drop the CONFIG_X86_NEW_MCE symbol and change all
references to it to check for CONFIG_X86_MCE directly.

No code changes

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 5bb38adc 08-Jul-2009 Andi Kleen <andi@firstfloor.org>

x86: mce: Remove old i386 machine check code

As announced in feature-remove-schedule.txt remove CONFIG_X86_OLD_MCE

This patch only removes code.

The ancient machine check code for very old systems that are not supported
by CONFIG_X86_NEW_MCE is still kept.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# bab9bc65 08-Jul-2009 Andi Kleen <andi@firstfloor.org>

x86: mce: Update X86_MCE description in x86/Kconfig

- Clarify that this config controls thermal throttling
reporting too
- Clarify the types of errors reported by machine checks
- Drop references to ancient CPUs.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# c31d9633 08-Jul-2009 Andi Kleen <andi@firstfloor.org>

x86: mce: Make CONFIG_X86_ANCIENT_MCE dependent on CONFIG_X86_MCE

Add a missing depency for ANCIENT_MCE. It didn't matter
in practice because the ANCIENT code wasn't compiled without
X86_MCE, but it's better to express that clearly in
Kconfig.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 62edf5dc 04-Jul-2009 David Woodhouse <David.Woodhouse@intel.com>

intel-iommu: Restore DMAR_BROKEN_GFX_WA option for broken graphics drivers

We need to give people a little more time to fix the broken drivers.
Re-introduce this, but tied in properly with the 'iommu=pt' support this
time. Change the config option name and make it default to 'no' too.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>


# c7ab48d2 26-Jun-2009 David Woodhouse <David.Woodhouse@intel.com>

intel-iommu: Clean up identity mapping code, remove CONFIG_DMAR_GFX_WA

There's no need for the GFX workaround now we have 'iommu=pt' for the
cases where people really care about performance. There's no need to
have a special case for just one type of device.

This also speeds up the iommu=pt path and reduces memory usage by
setting up the si_domain _once_ and then using it for all devices,
rather than giving each device its own private page tables.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>


# e74e3962 30-Mar-2009 Tejun Heo <tj@kernel.org>

percpu: use dynamic percpu allocator as the default percpu allocator

This patch makes most !CONFIG_HAVE_SETUP_PER_CPU_AREA archs use
dynamic percpu allocator. The first chunk is allocated using
embedding helper and 8k is reserved for modules. This ensures that
the new allocator behaves almost identically to the original allocator
as long as static percpu variables are concerned, so it shouldn't
introduce much breakage.

s390 and alpha use custom SHIFT_PERCPU_PTR() to work around addressing
range limit the addressing model imposes. Unfortunately, this breaks
if the address is specified using a variable, so for now, the two
archs aren't converted.

The following architectures are affected by this change.

* sh
* arm
* cris
* mips
* sparc(32)
* blackfin
* avr32
* parisc (broken, under investigation)
* m32r
* powerpc(32)

As this change makes the dynamic allocator the default one,
CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is replaced with its invert -
CONFIG_HAVE_LEGACY_PER_CPU_AREA, which is added to yet-to-be converted
archs. These archs implement their own setup_per_cpu_areas() and the
conversion is not trivial.

* powerpc(64)
* sparc(64)
* ia64
* alpha
* s390

Boot and batch alloc/free tests on x86_32 with debug code (x86_32
doesn't use default first chunk initialization). Compile tested on
sparc(32), powerpc(32), arm and alpha.

Kyle McMartin reported that this change breaks parisc. The problem is
still under investigation and he is okay with pushing this patch
forward and fixing parisc later.

[ Impact: use dynamic allocator for most archs w/o custom percpu setup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Bryan Wu <cooloney@kernel.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>


# 71e308a2 17-Jun-2009 Steven Rostedt <srostedt@redhat.com>

function-graph: add stack frame test

In case gcc does something funny with the stack frames, or the return
from function code, we would like to detect that.

An arch may implement passing of a variable that is unique to the
function and can be saved on entering a function and can be tested
when exiting the function. Usually the frame pointer can be used for
this purpose.

This patch also implements this for x86. Where it passes in the stack
frame of the parent function, and will test that frame on exit.

There was a case in x86_32 with optimize for size (-Os) where, for a
few functions, gcc would align the stack frame and place a copy of the
return address into it. The function graph tracer modified the copy and
not the actual return address. On return from the funtion, it did not go
to the tracer hook, but returned to the parent. This broke the function
graph tracer, because the return of the parent (where gcc did not do
this funky manipulation) returned to the location that the child function
was suppose to. This caused strange kernel crashes.

This test detected the problem and pointed out where the issue was.

This modifies the parameters of one of the functions that the arch
specific code calls, so it includes changes to arch code to accommodate
the new prototype.

Note, I notice that the parsic arch implements its own push_return_trace.
This is now a generic function and the ftrace_push_return_trace should be
used instead. This patch does not touch that code.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 7c095e46 17-Jun-2009 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

dma-mapping: x86: use asm-generic/dma-mapping-common.h

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0a4af3b0 26-Feb-2009 Pekka Enberg <penberg@cs.helsinki.fi>

kmemcheck: make kconfig accessible for other architectures

The Kconfig options of kmemcheck are hidden under arch/x86 which makes porting
to other architectures harder. To fix that, move the Kconfig bits to
lib/Kconfig.kmemcheck and introduce a CONFIG_HAVE_ARCH_KMEMCHECK config option
that architectures can define.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>


# 0067f129 01-Jun-2009 K.Prasad <prasad@linux.vnet.ibm.com>

hw-breakpoints: x86 architecture implementation of Hardware Breakpoint interfaces

This patch introduces the arch-specific implementation of the generic
hardware breakpoints in kernel/hw_breakpoint.c inside x86 specific directories.
It contains functions which help to validate and serve requests using
Hardware Breakpoint registers on x86 processors.

[ fweisbec@gmail.com: fix conflict against kmemcheck ]

Original-patch-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>


# cd13adcc 27-May-2009 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

x86: trivial clean up for arch/x86/Kconfig

Use tab.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# ea149b36 29-Apr-2009 Andi Kleen <ak@linux.intel.com>

x86, mce: add basic error injection infrastructure

Allow user programs to write mce records into /dev/mcelog. When they do
that a fake machine check is triggered to test the machine check code.

This uses the MCE MSR wrappers added earlier.

The implementation is straight forward. There is a struct mce record
per CPU and the MCE MSR accesses get data from there if there is valid
data injected there. This allows to test the machine check code
relatively realistically because only the lowest layer of hardware
access is intercepted.

The test suite and injector are available at
git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git
git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# de5619df 28-Apr-2009 Andi Kleen <ak@linux.intel.com>

x86, mce: enable MCE_AMD for 32bit NEW_MCE

That's very easy using the infrastructure enabled earlier for MCE_INTEL

Untested.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 7856f6cc 28-Apr-2009 Andi Kleen <ak@linux.intel.com>

x86, mce: enable MCE_INTEL for 32bit new MCE

Enable the 64bit MCE_INTEL code (CMCI, thermal interrupts) for 32bit NEW_MCE.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 4efc0670 28-Apr-2009 Andi Kleen <ak@linux.intel.com>

x86, mce: use 64bit machine check code on 32bit

The 64bit machine check code is in many ways much better than
the 32bit machine check code: it is more specification compliant,
is cleaner, only has a single code base versus one per CPU,
has better infrastructure for recovery, has a cleaner way to communicate
with user space etc. etc.

Use the 64bit code for 32bit too.

This is the second attempt to do this. There was one a couple of years
ago to unify this code for 32bit and 64bit. Back then this ran into some
trouble with K7s and was reverted.

I believe this time the K7 problems (and some others) are addressed.
I went over the old handlers and was very careful to retain
all quirks.

But of course this needs a lot of testing on old systems. On newer
64bit capable systems I don't expect much problems because they have been
already tested with the 64bit kernel.

I made this a CONFIG for now that still allows to select the old
machine check code. This is mostly to make testing easier,
if someone runs into a problem we can ask them to try
with the CONFIG switched.

The new code is default y for more coverage.

Once there is confidence the 64bit code works well on older hardware
too the CONFIG_X86_OLD_MCE and the associated code can be easily
removed.

This causes a behaviour change for 32bit installations. They now
have to install the mcelog package to be able to log
corrected machine checks.

The 64bit machine check code only handles CPUs which support the
standard Intel machine check architecture described in the IA32 SDM.
The 32bit code has special support for some older CPUs which
have non standard machine check architectures, in particular
WinChip C3 and Intel P5. I made those a separate CONFIG option
and kept them for now. The WinChip variant could be probably
removed without too much pain, it doesn't really do anything
interesting. P5 is also disabled by default (like it
was before) because many motherboards have it miswired, but
according to Alan Cox a few embedded setups use that one.

Forward ported/heavily changed version of old patch, original patch
included review/fixes from Thomas Gleixner, Bert Wesarg.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# b4ecc126 13-May-2009 Jeremy Fitzhardinge <jeremy@goop.org>

x86: Fix performance regression caused by paravirt_ops on native kernels

Xiaohui Xin and some other folks at Intel have been looking into what's
behind the performance hit of paravirt_ops when running native.

It appears that the hit is entirely due to the paravirtualized
spinlocks introduced by:

| commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8
| Date: Mon Jul 7 12:07:51 2008 -0700
|
| paravirt: introduce a "lock-byte" spinlock implementation

The extra call/return in the spinlock path is somehow
causing an increase in the cycles/instruction of somewhere around 2-7%
(seems to vary quite a lot from test to test). The working theory is
that the CPU's pipeline is getting upset about the
call->call->locked-op->return->return, and seems to be failing to
speculate (though I haven't seen anything definitive about the precise
reasons). This doesn't entirely make sense, because the performance
hit is also visible on unlock and other operations which don't involve
locked instructions. But spinlock operations clearly swamp all the
other pvops operations, even though I can't imagine that they're
nearly as common (there's only a .05% increase in instructions
executed).

If I disable just the pv-spinlock calls, my tests show that pvops is
identical to non-pvops performance on native (my measurements show that
it is actually about .1% faster, but Xiaohui shows a .05% slowdown).

Summary of results, averaging 10 runs of the "mmperf" test, using a
no-pvops build as baseline:

nopv Pv-nospin Pv-spin
CPU cycles 100.00% 99.89% 102.18%
instructions 100.00% 100.10% 100.15%
CPI 100.00% 99.79% 102.03%
cache ref 100.00% 100.84% 100.28%
cache miss 100.00% 90.47% 88.56%
cache miss rate 100.00% 89.72% 88.31%
branches 100.00% 99.93% 100.04%
branch miss 100.00% 103.66% 107.72%
branch miss rt 100.00% 103.73% 107.67%
wallclock 100.00% 99.90% 102.20%

The clear effect here is that the 2% increase in CPI is
directly reflected in the final wallclock time.

(The other interesting effect is that the more ops are
out of line calls via pvops, the lower the cache access
and miss rates. Not too surprising, but it suggests that
the non-pvops kernel is over-inlined. On the flipside,
the branch misses go up correspondingly...)

So, what's the fix?

Paravirt patching turns all the pvops calls into direct calls, so
_spin_lock etc do end up having direct calls. For example, the compiler
generated code for paravirtualized _spin_lock is:

<_spin_lock+0>: mov %gs:0xb4c8,%rax
<_spin_lock+9>: incl 0xffffffffffffe044(%rax)
<_spin_lock+15>: callq *0xffffffff805a5b30
<_spin_lock+22>: retq

The indirect call will get patched to:
<_spin_lock+0>: mov %gs:0xb4c8,%rax
<_spin_lock+9>: incl 0xffffffffffffe044(%rax)
<_spin_lock+15>: callq <__ticket_spin_lock>
<_spin_lock+20>: nop; nop /* or whatever 2-byte nop */
<_spin_lock+22>: retq

One possibility is to inline _spin_lock, etc, when building an
optimised kernel (ie, when there's no spinlock/preempt
instrumentation/debugging enabled). That will remove the outer
call/return pair, returning the instruction stream to a single
call/return, which will presumably execute the same as the non-pvops
case. The downsides arel 1) it will replicate the
preempt_disable/enable code at eack lock/unlock callsite; this code is
fairly small, but not nothing; and 2) the spinlock definitions are
already a very heavily tangled mass of #ifdefs and other preprocessor
magic, and making any changes will be non-trivial.

The other obvious answer is to disable pv-spinlocks. Making them a
separate config option is fairly easy, and it would be trivial to
enable them only when Xen is enabled (as the only non-default user).
But it doesn't really address the common case of a distro build which
is going to have Xen support enabled, and leaves the open question of
whether the native performance cost of pv-spinlocks is worth the
performance improvement on a loaded Xen system (10% saving of overall
system CPU when guests block rather than spin). Still it is a
reasonable short-term workaround.

[ Impact: fix pvops performance regression when running native ]

Analysed-by: "Xin Xiaohui" <xiaohui.xin@intel.com>
Analysed-by: "Li Xin" <xin.li@intel.com>
Analysed-by: "Nakajima Jun" <jun.nakajima@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4A0B62F7.5030802@goop.org>
[ fixed the help text ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 26717808 07-May-2009 H. Peter Anvin <hpa@zytor.com>

x86: make CONFIG_RELOCATABLE the default

Remove the EXPERIMENTAL tag from CONFIG_RELOCATABLE and make it the
default. Relocatable kernels have been used for a while now, and
should now have identical semantics to non-relocatable kernels when
loaded by a non-relocating bootloader.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# ceefccc9 11-May-2009 H. Peter Anvin <hpa@zytor.com>

x86: default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB

Default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN each to 16 MB,
so that both non-relocatable and relocatable kernels are loaded at
16 MB by a non-relocating bootloader. This is somewhat hacky, but it
appears to be the only way to do this that does not break some some
set of existing bootloaders.

We want to avoid the bottom 16 MB because of large page breakup,
memory holes, and ZONE_DMA. Embedded systems may need to reduce this,
or update their bootloaders to be aware of the new min_alignment field.

[ Impact: performance improvement, avoids problems on some systems ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 845adf72 05-May-2009 H. Peter Anvin <hpa@zytor.com>

x86: add a Kconfig symbol for when relocations are needed

We only need to build relocations when we are building a 32-bit
relocatable kernel. Rather than unnecessarily complicating the
Makefiles, make an explicit Kbuild symbol for this.

[ Impact: permits future cleanup ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Sam Ravnborg <sam@ravnborg.org>


# 15e957d0 30-Apr-2009 Yinghai Lu <yinghai@kernel.org>

x86/irq: use move_irq_desc() in create_irq_nr()

move_irq_desc() will try to move irq_desc to the home node if
the allocated one is not correct, in create_irq_nr().

( This can happen on devices that are on different nodes that
are using MSI, when drivers are loaded and unloaded randomly. )

v2: fix non-smp build
v3: add NUMA_IRQ_DESC to eliminate #ifdefs

[ Impact: improve irq descriptor locality on NUMA systems ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49F95EAE.2050903@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fcef5911 27-Apr-2009 Yinghai Lu <yinghai@kernel.org>

x86/irq: remove leftover code from NUMA_MIGRATE_IRQ_DESC

The original feature of migrating irq_desc dynamic was too fragile
and was causing problems: it caused crashes on systems with lots of
cards with MSI-X when user-space irq-balancer was enabled.

We now have new patches that create irq_desc according to device
numa node. This patch removes the leftover bits of the dynamic balancer.

[ Impact: remove dead code ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49F654AF.8000808@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 51b26ada 26-Apr-2009 Linus Torvalds <torvalds@linux-foundation.org>

x86: unify arch/x86/boot/compressed/vmlinux_*.lds

Look at the:

diff -u arch/x86/boot/compressed/vmlinux_*.lds

output and realize that they're basially exactly the same except for
trivial naming differences, and the fact that the 64-bit version has a
"pgtable" thing.

So unify them.

There's some trivial cleanup there (make the output format a Kconfig thing
rather than doing #ifdef's for it, and unify both 32-bit and 64-bit BSS
end to "_ebss", where 32-bit used to use the traditional "_end"), but
other than that it's really very mindless and straigt conversion.

For example, I think we should aim to remove "startup_32" vs "startup_64",
and just call it "startup", and get rid of one more difference. I didn't
do that.

Also, notice the comment in the unified vmlinux.lds.S talks about
"head_64" and "startup_32" which is an odd and incorrect mix, but that was
actually what the old 64-bit only lds file had, so the confusion isn't
new, and now that mixing is arguably more accurate thanks to the
vmlinux.lds.S file being shared between the two cases ;)

[ Impact: cleanup, unification ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2a3313f4 21-Apr-2009 Michael K. Johnson <johnsonm@rpath.com>

x86: more than 8 32-bit CPUs requires X86_BIGSMP

$ cat x86-more-than-8-cpus-requires-bigsmp.patch

Enforce NR_CPUS <= 8 limitation if X86_BIGSMP not set

Configuring more than 8 logical CPUs on 32-bit x86 requires
X86_BIGSMP to be set in order to boot successfully, if more than 8
logical CPUs are actually found at boot time. The X86_BIGSMP help
text describes that it is required to be set if more than 8 CPUs
are configured, but this was previously not enforced.

This configuration error has affected multiple distributions:
https://bugzilla.redhat.com/show_bug.cgi?id=480844
https://issues.rpath.com/browse/RPL-3022

Signed-off-by: Michael K Johnson <johnsonm@rpath.com>
LKML-Reference: <20090422014448.GB32541@logo.rdu.rpath.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 9d6c26e7 20-Apr-2009 Suresh Siddha <suresh.b.siddha@intel.com>

x86: x2apic, IR: Make config X86_UV dependent on X86_X2APIC

Instead of selecting X86_X2APIC, make config X86_UV dependent
on X86_X2APIC.

This will eliminate enabling CONFIG_X86_X2APIC with out
enabling CONFIG_INTR_REMAP.

[ Impact: cleanup ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Cc: dwmw2@infradead.org
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Weidong Han <weidong.han@intel.com>
LKML-Reference: <20090420200450.694598000@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ca713c2a 15-Apr-2009 Yinghai Lu <yinghai@kernel.org>

x86/irq: mark NUMA_MIGRATE_IRQ_DESC broken

It causes crash on system with lots of cards with MSI-X
when irq_balancer enabled...

The patches fixing it were both complex and fragile, according
to Eric they were also doing quite dangerous things to the
hardware.

Instead we now have patches that solve this problem via static
NUMA node mappings - not dynamic allocation and balancing.

The patches are much simpler than this method but are still too
large outside of the merge window, so we mark the dynamic balancer
as broken for now, and queue up the new approach for v2.6.31.

[ Impact: deactivate broken kernel feature ]

Reported-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49E68C41.4020801@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 54c28d29 03-Apr-2009 Jack Steiner <steiner@sgi.com>

x86, uv: add Kconfig dependency on NUMA for UV systems

Impact: build fix

Add Kconfig dependency on NUMA for enabling UV. Although it might
be possible to configure non-NUMA UV systems, they are unsupported
and not interesting. Much of the infrastructure for UV requires
NUMA support.

Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20090403203942.GA20137@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f7d7f866 07-Apr-2009 David Woodhouse <dwmw2@infradead.org>

x86, intel-iommu: fix X2APIC && !ACPI build failure

This build failure:

| drivers/pci/dmar.c:47: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘dmar_tbl_size’
| drivers/pci/dmar.c:62: warning: ‘struct acpi_dmar_device_scope’ declared inside parameter list
| drivers/pci/dmar.c:62: warning: its scope is only this definition or declaration, which is probably not what you want

Triggers due to this commit:

d0b03bd: x2apic/intr-remap: decouple interrupt remapping from x2apic

Which exposed a pre-existing but dormant fragility of the 'select X86_X2APIC'
it moved around and turned that fragility into a build failure.

Replace it with a proper 'depends on' construct.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
LKML-Reference: <1239084280.22733.404.camel@macbook.infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# d0b03bd1 03-Apr-2009 Han, Weidong <weidong.han@intel.com>

x2apic/intr-remap: decouple interrupt remapping from x2apic

interrupt remapping must be enabled before enabling x2apic, but
interrupt remapping doesn't depend on x2apic, it can be used
separately. Enable interrupt remapping in init_dmars even x2apic
is not supported.

[dwmw2: Update Kconfig accordingly, fix build with INTR_REMAP && !X2APIC]

Signed-off-by: Weidong Han <weidong.han@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>


# 6a11f75b 31-Mar-2009 Akinobu Mita <akinobu.mita@gmail.com>

generic debug pagealloc

CONFIG_DEBUG_PAGEALLOC is now supported by x86, powerpc, sparc64, and
s390. This patch implements it for the rest of the architectures by
filling the pages with poison byte patterns after free_pages() and
verifying the poison patterns before alloc_pages().

This generic one cannot detect invalid page accesses immediately but
invalid read access may cause invalid dereference by poisoned memory and
invalid write access can be detected after a long delay.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 692105b8 26-Jan-2009 Matt LaPlante <kernel1@cyberdogtech.com>

trivial: fix typos/grammar errors in Kconfig texts

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# fc2869f6 13-Mar-2009 Thomas Gleixner <tglx@linutronix.de>

x86: disable __do_IRQ support

Impact: disable unused code

x86 is fully converted to flow handlers. No need to keep the
deprecated __do_IRQ() support active.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 4cf2e75d 11-Feb-2009 David Woodhouse <dwmw2@infradead.org>

intel-iommu: Enable DMAR on 32-bit kernel.

If we fix a few highmem-related thinkos and a couple of printk format
warnings, the Intel IOMMU driver works fine in a 32-bit kernel.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>


# 2118d0c5 09-Jan-2009 Joerg Roedel <joerg.roedel@amd.com>

dma-debug: x86 architecture bindings

Impact: make use of DMA-API debugging code in x86

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>


# f9a36fa5 13-Mar-2009 Thomas Gleixner <tglx@linutronix.de>

x86: disable __do_IRQ support

Impact: disable unused code

x86 is fully converted to flow handlers. No need to keep the
deprecated __do_IRQ() support active.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 1b3fa2ce 06-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com>

tracing/x86: basic implementation of syscall tracing for x86

Provide the x86 trace callbacks to trace syscalls.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <1236401580-5758-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 46d50c98 11-Mar-2009 Jan Beulich <jbeulich@novell.com>

x86, 32-bit: also limit NODES_HIGH_SHIFT here

Impact: configuration bug fix

Just like for x86-64, the range of widths valid for NODE_SHIFT is not
unbounded. The upper bound 64-bit uses is definitely also an upper
bound for 32-bit.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B90F12.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fee7b0d8 09-Mar-2009 Huang Ying <ying.huang@intel.com>

x86, kexec: x86_64: add kexec jump support for x86_64

Impact: New major feature

This patch add kexec jump support for x86_64. More information about
kexec jump can be found in corresponding x86_32 support patch.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 9b779edf 10-Mar-2009 Jaswinder Singh Rajput <jaswinderrajput@gmail.com>

x86: cpu architecture debug code

Introduce:

cat /sys/kernel/debug/x86/cpu/*

for Intel and AMD processors to view / debug the state of each CPU.

By using this we can debug whole range of registers and other
cpu information for debugging purpose and monitor how things
are changing.

This can be useful for developers as well as for users.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
LKML-Reference: <1236701373.3387.4.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f6be37fd 25-Feb-2009 Kyle McMartin <kyle@redhat.com>

x86: enable DMAR by default

Now that the obvious bugs have been worked out, specifically
the iwlagn issue, and the write buffer errata, DMAR should be safe
to turn back on by default. (We've had it on since those patches were
first written a few weeks ago, without any noticeable bug reports
(most have been due to the dma-api debug patchset.))

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b2762686 12-Feb-2009 Andi Kleen <andi@firstfloor.org>

x86, mce, cmci: factor out threshold interrupt handler

Impact: cleanup; preparation for feature

The mce_amd_64 code has an own private MC threshold vector with an own
interrupt handler. Since Intel needs a similar handler
it makes sense to share the vector because both can not
be active at the same time.

I factored the common APIC handler code into a separate file which can
be used by both the Intel or AMD MC code.

This is needed for the next patch which adds an Intel specific
CMCI handler.

This patch should be a nop for AMD, it just moves some code
around.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# c1329375 23-Feb-2009 Tejun Heo <tj@kernel.org>

bootmem: clean up arch-specific bootmem wrapping

Impact: cleaner and consistent bootmem wrapping

By setting CONFIG_HAVE_ARCH_BOOTMEM_NODE, archs can define
arch-specific wrappers for bootmem allocation. However, this is done
a bit strangely in that only the high level convenience macros can be
changed while lower level, but still exported, interface functions
can't be wrapped. This not only is messy but also leads to strange
situation where alloc_bootmem() does what the arch wants it to do but
the equivalent __alloc_bootmem() call doesn't although they should be
able to be used interchangeably.

This patch updates bootmem such that archs can override / wrap the
backend function - alloc_bootmem_core() instead of the highlevel
interface functions to allow simpler and consistent wrapping. Also,
HAVE_ARCH_BOOTMEM_NODE is renamed to HAVE_ARCH_BOOTMEM.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@saeurebad.de>


# 965c7eca 22-Feb-2009 Ingo Molnar <mingo@elte.hu>

x86: remove the Voyager 32-bit subarch

Impact: remove unused/broken code

The Voyager subarch last built successfully on the v2.6.26 kernel
and has been stale since then and does not build on the v2.6.27,
v2.6.28 and v2.6.29-rc5 kernels.

No actual users beyond the maintainer reported this breakage.
Patches were sent and most of the fixes were accepted but the
discussion around how to do a few remaining issues cleanly
fizzled out with no resolution and the code remained broken.

In the v2.6.30 x86 tree development cycle 32-bit subarch support
has been reworked and removed - and the Voyager code, beyond the
build problems already known, needs serious and significant
changes and probably a rewrite to support it.

CONFIG_X86_VOYAGER has been marked BROKEN then. The maintainer has
been notified but no patches have been sent so far to fix it.

While all other subarchs have been converted to the new scheme,
voyager is still broken. We'd prefer to receive patches which
clean up the current situation in a constructive way, but even in
case of removal there is no obstacle to add that support back
after the issues have been sorted out in a mutually acceptable
fashion.

So remove this inactive code for now.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 8425091f 20-Feb-2009 Ravikiran G Thirumalai <kiran@scalex86.org>

x86: improve the help text of X86_EXTENDED_PLATFORM

Change the CONFIG_X86_EXTENDED_PLATFORM help text to display the
32bit/64bit extended platform list. This is as suggested by Ingo.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: shai@scalex86.org
Cc: "Benzi Galili (Benzi@ScaleMP.com)" <benzi@scalemp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 11124411 20-Feb-2009 Tejun Heo <tj@kernel.org>

x86: convert to the new dynamic percpu allocator

Impact: use new dynamic allocator, unified access to static/dynamic
percpu memory

Convert to the new dynamic percpu allocator.

* implement populate_extra_pte() for both 32 and 64
* update setup_per_cpu_areas() to use pcpu_setup_static()
* define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr()
* define config HAVE_DYNAMIC_PER_CPU_AREA

Signed-off-by: Tejun Heo <tj@kernel.org>


# 7d01d32d 16-Feb-2009 Ingo Molnar <mingo@elte.hu>

x86, apic: fix build fallout of genapic changes

- make oprofile build
- select X86_X2APIC from X86_UV - it relies on it
- export genapic for oprofile modular build

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 06cd9a7d 16-Feb-2009 Yinghai Lu <yinghai@kernel.org>

x86: add x2apic config

Impact: cleanup

so could deselect x2apic
and INTR_REMAP will select x2apic

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 17993b49 11-Feb-2009 Ingo Molnar <mingo@elte.hu>

x86: make hibernation always-possible

This commit:

aced3ce: x86/Voyager: remove HIBERNATION Kconfig quirk

Made hibernation only available on UP - instead of making it available
on all of x86. Fix it.

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c5c606d9 09-Feb-2009 Ravikiran G Thirumalai <kiran@scalex86.org>

x86: cleanup, rename CONFIG_X86_NON_STANDARD to CONFIG_X86_EXTENDED_PLATFORM

Patch to rename the CONFIG_X86_NON_STANDARD to CONFIG_X86_EXTENDED_PLATFORM.

The new name represents the subarches better. Also, default this to 'y'
so that many of the sub architectures that were not easily visible now
become visible.

Also re-organize the extended architecture platform and non standard
platform list alphabetically as suggested by Ingo.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 60a5317f 09-Feb-2009 Tejun Heo <tj@kernel.org>

x86: implement x86_32 stack protector

Impact: stack protector for x86_32

Implement stack protector for x86_32. GDT entry 28 is used for it.
It's set to point to stack_canary-20 and have the length of 24 bytes.
CONFIG_CC_STACKPROTECTOR turns off CONFIG_X86_32_LAZY_GS and sets %gs
to the stack canary segment on entry. As %gs is otherwise unused by
the kernel, the canary can be anywhere. It's defined as a percpu
variable.

x86_32 exception handlers take register frame on stack directly as
struct pt_regs. With -fstack-protector turned on, gcc copies the
whole structure after the stack canary and (of course) doesn't copy
back on return thus losing all changed. For now, -fno-stack-protector
is added to all files which contain those functions. We definitely
need something better.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ccbeed3a 09-Feb-2009 Tejun Heo <tj@kernel.org>

x86: make lazy %gs optional on x86_32

Impact: pt_regs changed, lazy gs handling made optional, add slight
overhead to SAVE_ALL, simplifies error_code path a bit

On x86_32, %gs hasn't been used by kernel and handled lazily. pt_regs
doesn't have place for it and gs is saved/loaded only when necessary.
In preparation for stack protector support, this patch makes lazy %gs
handling optional by doing the followings.

* Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs.

* Save and restore %gs along with other registers in entry_32.S unless
LAZY_GS. Note that this unfortunately adds "pushl $0" on SAVE_ALL
even when LAZY_GS. However, it adds no overhead to common exit path
and simplifies entry path with error code.

* Define different user_gs accessors depending on LAZY_GS and add
lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS. The
lazy_*_gs() ops are used to save, load and clear %gs lazily.

* Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly.

xen and lguest changes need to be verified.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9a5fd902 05-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ftrace: change function graph tracer to use new in_nmi

The function graph tracer piggy backed onto the dynamic ftracer
to use the in_nmi custom code for dynamic tracing. The problem
was (as Andrew Morton pointed out) it really only wanted to bail
out if the context of the current CPU was in NMI context. But the
dynamic ftrace in_nmi custom code was true if _any_ CPU happened
to be in NMI context.

Now that we have a generic in_nmi interface, this patch changes
the function graph code to use it instead of the dynamic ftarce
custom code.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 78d904b4 05-Feb-2009 Steven Rostedt <srostedt@redhat.com>

ring-buffer: add NMI protection for spinlocks

Impact: prevent deadlock in NMI

The ring buffers are not yet totally lockless with writing to
the buffer. When a writer crosses a page, it grabs a per cpu spinlock
to protect against a reader. The spinlocks taken by a writer are not
to protect against other writers, since a writer can only write to
its own per cpu buffer. The spinlocks protect against readers that
can touch any cpu buffer. The writers are made to be reentrant
with the spinlocks disabling interrupts.

The problem arises when an NMI writes to the buffer, and that write
crosses a page boundary. If it grabs a spinlock, it can be racing
with another writer (since disabling interrupts does not protect
against NMIs) or with a reader on the same CPU. Luckily, most of the
users are not reentrant and protects against this issue. But if a
user of the ring buffer becomes reentrant (which is what the ring
buffers do allow), if the NMI also writes to the ring buffer then
we risk the chance of a deadlock.

This patch moves the ftrace_nmi_enter called by nmi_enter() to the
ring buffer code. It replaces the current ftrace_nmi_enter that is
used by arch specific code to arch_ftrace_nmi_enter and updates
the Kconfig to handle it.

When an NMI is called, it will set a per cpu variable in the ring buffer
code and will clear it when the NMI exits. If a write to the ring buffer
crosses page boundaries inside an NMI, a trylock is used on the spin
lock instead. If the spinlock fails to be acquired, then the entry
is discarded.

This bug appeared in the ftrace work in the RT tree, where event tracing
is reentrant. This workaround solved the deadlocks that appeared there.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 8f9ca475 05-Feb-2009 Ingo Molnar <mingo@elte.hu>

x86: clean up arch/x86/Kconfig*

- Consistent alignment of help text
- Use the ---help--- keyword everywhere consistently as a visual separator
- fix whitespace mismatches

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0cd5c3c8 04-Feb-2009 Kyle McMartin <kyle@redhat.com>

x86: disable intel_iommu support by default

Due to recurring issues with DMAR support on certain platforms.
There's a number of filesystem corruption incidents reported:

https://bugzilla.redhat.com/show_bug.cgi?id=479996
http://bugzilla.kernel.org/show_bug.cgi?id=12578

Provide a Kconfig option to change whether it is enabled by
default.

If disabled, it can still be reenabled by passing intel_iommu=on to the
kernel. Keep the .config option off by default.

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 26f7ef14 29-Jan-2009 Yinghai Lu <yinghai@kernel.org>

x86: don't treat bigsmp as non-standard

just like 64 bit switch from flat logical APIC messages to
flat physical mode automatically.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 4272ebfb 29-Jan-2009 Yinghai Lu <yinghai@kernel.org>

x86: allow more than 8 cpus to be used on 32-bit

X86_PC is the only remaining 'sub' architecture, so we dont need
it anymore.

This also cleans up a few spurious references to X86_PC in the
driver space - those certainly should be X86.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3769e7b4 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: move to the X86_32_NON_STANDARD code section

Make Voyager depend on X86_32_NON_STANDARD - it is a non-standard 32-bit
SMP architecture.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e0c7ae37 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86: rename X86_GENERICARCH to X86_32_NON_STANDARD

X86_GENERICARCH is a misnomer - it contains non-PC 32-bit architectures
that are not included in the default build.

Rename it to X86_32_NON_STANDARD.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e2c75d9f 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86: remove the subarch menu

Remove the subarch menu and standardize on X86_PC.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 6a48565e 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86: move X86_VSMP from subarch menu

Move X86_VSMP out of the subarch menu - this way it can be enabled
together with standard PC support as well, in the same kernel.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9c398017 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86: move non-standard 32-bit platform Kconfig entries

- make X86_GENERICARCH depend X86_NON_STANDARD

- move X86_SUMMIT, X86_ES7000 and X86_BIGSMP out of the subarchitecture
menu and under this option

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f67ae5c9 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86: move VOYAGER to the NON_STANDARD_PLATFORM section

Move X86_ELAN (old, NCR hw platform built on Intel CPUs) from the
subarchitecture menu to the non-standard-platform section.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9e111f3e 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86: move ELAN to the NON_STANDARD_PLATFORM section

Move X86_ELAN (old, AMD based web-boxes) from the subarchitecture
menu to the non-standard-platform section.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 06ac8346 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86: cleanup, introduce CONFIG_NON_STANDARD_PLATFORMS

Introduce a Y/N Kconfig option for non-PC x86 platforms.

Make VisWS, RDC321 and SGI/UV depend on this.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1ec2dafd 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove ISA quirk

Voyager has this ISA quirk (because Voyager has no ISA support):

config ISA
bool "ISA support"
depends on !X86_VOYAGER

There's a ton of x86 hardware that does not support ISA, and because
most ISA drivers cannot auto-detect in a safe way, the convention in
the kernel has always been to not enable ISA drivers if they are not
needed.

Voyager users can do likewise - no need for a Kconfig quirk.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1c61d8c3 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove power management Kconfig quirk

Voyager has this PM/ACPI Kconfig quirk:

menu "Power management and ACPI options"
depends on !X86_VOYAGER

Most of the PM features are auto-detect so they should be safe to run
on just about any hardware. (If not, those instances need fixing.)

In any case, if a kernel is built for Voyager, the power management
options can be disabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 4b19ed91 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove HOTPLUG_CPU Kconfig quirk

Voyager has this Kconfig quirk:

config HOTPLUG_CPU
bool "Support for hot-pluggable CPUs"
depends on SMP && HOTPLUG && !X86_VOYAGER

But this exception will be moot once Voyager starts using the
generic x86 code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e006235e 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove MCE quirk

If no MCE code is desired on Voyager hw then the solution
is to turn them off in the .config - and to extend the MCE
code to not initialize on Voyager.

Remove the build-time quirk.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 7cd92366 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove APIC/IO-APIC Kbuild quirk

The lapic/ioapic code properly auto-detects and is safe to run on CPUs that
have no local APIC. (or which have their lapic turned off in the hardware)

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c3e6a204 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove PARAVIRT Kconfig quirk

Remove this Kconfig quirk:

config PARAVIRT
bool "Enable paravirtualization code"
depends on !X86_VOYAGER
help

Voyager support built into a kernel does not preclude paravirt support.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 54523edd 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove KVM_GUEST quirk

Voyager has this quirk currently:

config KVM_GUEST
bool "KVM Guest support"
select PARAVIRT
depends on !X86_VOYAGER

Voyager support built into a kernel image does not exclude
KVM paravirt guest support - so remove this quirk.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e084e531 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove KVM_CLOCK quirk

Voyager has this build-time quirk to exclude KVM_CLOCK:

bool "KVM paravirtualized clock"
select PARAVIRT
select PARAVIRT_CLOCK
depends on !X86_VOYAGER

Voyager support built into a kernel image does not exclude
KVM paravirt clock support - so remove this quirk.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f154f47d 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove VMI Kconfig quirk

x86/Voyager has this build-time quirk:

bool "VMI Guest support"
select PARAVIRT
depends on X86_32
depends on !X86_VOYAGER

Since VMI is auto-detected (and Voyager will be auto-detected) there's no
reason for this quirk.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 36619a8a 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/VisWS: remove Kconfig quirk

VisWS has this quirk currently:

config X86_VISWS
bool "SGI 320/540 (Visual Workstation)"
depends on X86_32 && PCI && !X86_VOYAGER && X86_MPPARSE && PCI_GODIRECT

The !Voyager quirk is unnecessary.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 550fe4f1 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove X86_FIND_SMP_CONFIG Kconfig quirk

x86/Voyager had this Kconfig quirk:

config X86_FIND_SMP_CONFIG
def_bool y
depends on X86_MPPARSE || X86_VOYAGER

Which splits off the find_smp_config() callback into a build-time quirk.

Voyager should use the existing x86_quirks.mach_find_smp_config() callback
to introduce SMP-config quirks. NUMAQ-32 and VISWS already use this.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f095df0a 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove X86_BIOS_REBOOT Kconfig quirk

Voyager has this Kconfig quirk:

config X86_BIOS_REBOOT
bool
depends on !X86_VOYAGER
default y

Voyager should use the existing machine_ops.emergency_restart reboot
quirk mechanism instead of a build-time quirk.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 23394d1c 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove X86_HT Kconfig quirk

Voyager has this Kconfig quirk:

depends on (X86_32 && !X86_VOYAGER) || X86_64

That is unnecessary as HT support is CPUID driven and explicitly
enumerated.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c0b5842a 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86: generalize boot_cpu_id

x86/Voyager can boot on non-zero processors. While that can probably
be fixed by properly remapping the physical CPU IDs, keep boot_cpu_id
for now for easier transition - and expand it to all of x86.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3e5095d1 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86: replace CONFIG_X86_SMP with CONFIG_SMP

The x86/Voyager subarch used to have this distinction between
'x86 SMP support' and 'Voyager SMP support':

config X86_SMP
bool
depends on SMP && ((X86_32 && !X86_VOYAGER) || X86_64)

This is a pointless distinction - Voyager can (and already does) use
smp_ops to implement various SMP quirks it has - and it can be extended
more to cover all the specialities of Voyager.

So remove this complication in the Kconfig space.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f2fc0e30 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove ARCH_SUSPEND_POSSIBLE Kconfig quirk

Voyager has this Kconfig quirk for suspend/resume:

config ARCH_SUSPEND_POSSIBLE
def_bool y
depends on !X86_VOYAGER

The proper mechanism to not suspend on a piece of hardware to disable
CONFIG_SUSPEND. Remove the quirk.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# aced3cee 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove HIBERNATION Kconfig quirk

Voyager has this hibernation quirk:

config ARCH_HIBERNATION_POSSIBLE
def_bool y
depends on !SMP || !X86_VOYAGER

Hibernation is a generic facility provided on all x86 platforms. If it
is buggy on Voyager then that bug should be fixed - not worked around.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 49793b03 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove KGDB Kconfig quirk

x86/Voyager has this KGDB quirk:

select HAVE_ARCH_KGDB if !X86_VOYAGER

This is completely pointless - there's nothing in KGDB that cannot work
on Voyager. Remove it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e0ec9483 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove KVM Kconfig quirk

Voyager and other subarchitectures have this Kconfig quirk:

select HAVE_KVM if ((X86_32 && !X86_VOYAGER && !X86_VISWS && !X86_NUMAQ) || X86_64)

This is unnecessary, as KVM cleanly detects based on CPUID capabilities.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 07ef83ae 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove NATSEMI Kconfig quirk

x86/Voyager has this quirk for SCx200 support:

config SCx200
tristate "NatSemi SCx200 support"
depends on !X86_VOYAGER

Remove it - Voyager users can disable drivers they dont need.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 72ee6ebb 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: remove MCA Kconfig quirk

Remove Voyager Kconfig quirk: just like any other hardware platform
users of Voyager systems can configure in the hardware drivers they need.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 61b8172e 28-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86: disable Voyager temporarily

x86/Voyager does not build right now and it's unclear whether it will
be cleaned up and ported to the subarch-less 32-bit x86 code - so disable
it for now.

If it's fixed we'll re-enable it - or remove it after some time. There's
a very low number of systems running development kernels on x86/Voyager
currently. (one or two on the whole planet)

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e7c64981 27-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86/Voyager: clean up BROKEN Kconfig reference

CONFIG_BROKEN has been removed from the upstream kernel years ago,
but X86_VOYAGER still had a stale reference to it - remove it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 89c9c4c5 26-Jan-2009 Brian Gerst <brgerst@gmail.com>

x86: make Voyager use x86 per-cpu setup.

Impact: standardize all x86 platforms on same setup code

With the preceding changes, Voyager can use the same per-cpu setup
code as all the other x86 platforms.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 03b48632 19-Jan-2009 Nick Piggin <npiggin@suse.de>

x86: make UV support configurable

Make X86 SGI Ultraviolet support configurable. Saves about 13K of text size
on my modest config.

text data bss dec hex filename
6770537 1158680 694356 8623573 8395d5 vmlinux
6757492 1157664 694228 8609384 835e68 vmlinux.nouv

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# da4276b8 07-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86: offer frame pointers in all build modes

CONFIG_FRAME_POINTERS=y results in much better debug info for the
kernel (clear and precise backtraces), with the only drawback being
a ~1% increase in kernel size.

So offer it unconditionally and enable it by default.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2e9f3bdd 04-Jan-2009 H. Peter Anvin <hpa@zytor.com>

bzip2/lzma: make config machinery an arch configurable

Impact: Bug fix (we should not show this menu on irrelevant architectures)

Make the config machinery to drive the gzip/bzip2/lzma selection
dependent on the architecture advertising HAVE_KERNEL_* so that we
don't display this for architectures where it doesn't matter.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 2e117604 11-Dec-2008 Joerg Roedel <joerg.roedel@amd.com>

AMD IOMMU: add Kconfig entry for statistic collection code

Impact: adds new Kconfig entry

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>


# 1aaf1183 26-Nov-2008 Joerg Roedel <joerg.roedel@amd.com>

select IOMMU_API when DMAR and/or AMD_IOMMU is selected

These two IOMMUs can implement the current version of this API. So
select the API if one or both of these IOMMU drivers is selected.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>


# 973656fe 25-Dec-2008 Ingo Molnar <mingo@elte.hu>

x86, sparseirq: clean up Kconfig entry

Impact: improve help text

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 4e17fee2 24-Dec-2008 Ingo Molnar <mingo@elte.hu>

x86: turn CONFIG_SPARSE_IRQ off by default

New feature - lets disable it by default first - can flip it around
later.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b9098957 19-Dec-2008 Yinghai Lu <yinghai@kernel.org>

sparseirq: fix numa_migrate_irq_desc dependency and comments

Impact: reduce kconfig variable scope and clean up

Bartlomiej pointed out that the config dependencies and comments are not right.

update it depend to NUMA, and fix some comments

Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 78637a97 16-Dec-2008 Mike Travis <travis@sgi.com>

x86: Set CONFIG_NR_CPUS even on UP

Impact: cleanup

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>


# 36f5101a 16-Dec-2008 Mike Travis <travis@sgi.com>

x86: enable MAXSMP

Impact: activates new off-stack cpumask code on MAXSMP (non-default) x86 configs

Set MAXSMP to enable CONFIG_CPUMASK_OFFSTACK which moves cpumask's off
the stack (and in structs) when using cpumask_var_t.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hy>


# 17483a1f 12-Dec-2008 Yinghai Lu <yinghai@kernel.org>

sparseirq: fix !SMP building, #2

Impact: build fix

make intr_remapping.c to include smp.h, so could use boot_cpu_id there

also remove old change that disabling sparseirq with !SMP

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 48a1b10a 11-Dec-2008 Yinghai Lu <yinghai@kernel.org>

x86, sparseirq: move irq_desc according to smp_affinity, v7

Impact: improve NUMA handling by migrating irq_desc on smp_affinity changes

if CONFIG_NUMA_MIGRATE_IRQ_DESC is set:

- make irq_desc to go with affinity aka irq_desc moving etc
- call move_irq_desc in irq_complete_move()
- legacy irq_desc is not moved, because they are allocated via static array

for logical apic mode, need to add move_desc_in_progress_in_same_domain,
otherwise it will not be moved ==> also could need two phases to get
irq_desc moved.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b93a531e 16-Dec-2008 Jan Beulich <jbeulich@novell.com>

allow bug table entries to use relative pointers (and use it on x86-64)

Impact: reduce bug table size

This allows reducing the bug table size by half. Perhaps there are
other 64-bit architectures that could also make use of this.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ee06094f 13-Dec-2008 Ingo Molnar <mingo@elte.hu>

perfcounters: restructure x86 counter math

Impact: restructure code

Change counter math from absolute values to clear delta logic.

We try to extract elapsed deltas from the raw hw counter - and put
that into the generic counter.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 8a4830f8 09-Dec-2008 Yinghai Lu <yinghai@kernel.org>

sparseirq: fix !SMP && !PCI_MSI && !HT_IRQ build

Ingo Molnar wrote:

>>> drivers/pci/intr_remapping.c: In function 'irq_2_iommu_alloc':
>>> drivers/pci/intr_remapping.c:72: error: 'boot_cpu_id' undeclared (first use in this function)
>>> drivers/pci/intr_remapping.c:72: error: (Each undeclared identifier is reported only once
>>> drivers/pci/intr_remapping.c:72: error: for each function it appears in.)

sparseirq should only be used with SMP for now.


# 241771ef 03-Dec-2008 Ingo Molnar <mingo@elte.hu>

performance counters: x86 support

Implement performance counters for x86 Intel CPUs.

It's simplified right now: the PERFMON CPU feature is assumed,
which is available in Core2 and later Intel CPUs.

The design is flexible to be extended to more CPU types as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0b8f1efa 05-Dec-2008 Yinghai Lu <yinghai@kernel.org>

sparse irq_desc[] array: core kernel and x86 changes

Impact: new feature

Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
NR_CPUS set to large values. The goal is to be able to scale up to much
larger NR_IRQS value without impacting the (important) common case.

To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
irq_desc pointers.

When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
this also makes the IRQ descriptors NUMA-local (to the site that calls
request_irq()).

This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
uses desc->chip_data for x86 to store irq_cfg.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 8daa1905 01-Dec-2008 Niels de Vos <niels.devos@wincor-nixdorf.com>

x86, apm: remove CONFIG_APM_REAL_MODE_POWER_OFF in favor of a kernel parameter

Remove CONFIG_APM_REAL_MODE_POWER_OFF like CONFIG_APM_POWER_OFF which
has been done for linux-2.2.14pre8 (http://lkml.org/lkml/1999/11/23/3).

Re-introducing CONFIG_APM_POWER_OFF got nack-ed. Stephen didn't bother
to remove CONFIG_APM_REAL_MODE_POWER_OFF, let's get rid of it now.
Reference: http://lkml.org/lkml/2008/5/7/97

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 48d68b20 01-Dec-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-graph-tracer: support for x86-64

Impact: extend and enable the function graph tracer to 64-bit x86

This patch implements the support for function graph tracer under x86-64.
Both static and dynamic tracing are supported.

This causes some small CPP conditional asm on arch/x86/kernel/ftrace.c I
wanted to use probe_kernel_read/write to make the return address
saving/patching code more generic but it causes tracing recursion.

That would be perhaps useful to implement a notrace version of these
function for other archs ports.

Note that arch/x86/process_64.c is not traced, as in X86-32. I first
thought __switch_to() was responsible of crashes during tracing because I
believed current task were changed inside but that's actually not the
case (actually yes, but not the "current" pointer).

So I will have to investigate to find the functions that harm here, to
enable tracing of the other functions inside (but there is no issue at
this time, while process_64.c stays out of -pg flags).

A little possible race condition is fixed inside this patch too. When the
tracer allocate a return stack dynamically, the current depth is not
initialized before but after. An interrupt could occur at this time and,
after seeing that the return stack is allocated, the tracer could try to
trace it with a random uninitialized depth. It's a prevention, even if I
hadn't problems with it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2c5643b1 30-Nov-2008 Hitoshi Mitake <h.mitake@gmail.com>

x86: provide readq()/writeq() on 32-bit too

Impact: add new API for drivers

Add implementation of readq/writeq to x86_32, and add config value to
the x86 architecture to determine existence of readq/writeq.

Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fb52607a 25-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-return-tracer: change the name into function-graph-tracer

Impact: cleanup

This patch changes the name of the "return function tracer" into
function-graph-tracer which is a more suitable name for a tracing
which makes one able to retrieve the ordered call stack during
the code flow.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e45f2c07 24-Nov-2008 Denis V. Lunev <den@openvz.org>

x86: correct link to HPET timer specification

Impact: update documentation / help text

Original link is dead.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 8d26487f 22-Nov-2008 Török Edwin <edwintorok@gmail.com>

tracing/stack-tracer: introduce CONFIG_USER_STACKTRACE_SUPPORT

Impact: cleanup

User stack tracing is just implemented for x86, but it is not x86 specific.

Introduce a generic config flag, that is currently enabled only for x86.
When other arches implement it, they will have to
SELECT USER_STACKTRACE_SUPPORT.

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b5fe363b 18-Nov-2008 Yinghai Lu <yinghai@kernel.org>

x86: use update_genapic to get rid of ES7000_CLUSTERED_APIC v2

Impact: clean up

We can autodetect those system that need cluster apic, and update genapic
accordingly.

We can also remove wakeup.h for e7000, because it's default one is now
the same as overall default mach_wakecpu.h

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a1afd01c 17-Nov-2008 Joerg Roedel <joerg.roedel@amd.com>

x86: default to SWIOTLB=y on x86_64

Impact: fixes korg bugzilla 11980

A kernel for a 64bit x86 system should always contain the swiotlb code
in case it is booted on a machine without any hardware IOMMU supported
by the kernel and more than 4GB of RAM. This patch changes Kconfig to
always compile swiotlb into the kernel for x86_64.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: stable@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 604d2055 12-Nov-2008 Rafael J. Wysocki <rjw@rjwysocki.net>

x86: make NUMA on 32-bit depend on EXPERIMENTAL again

My previous patch to make CONFIG_NUMA on x86_32 depend on BROKEN
turned out to be unnecessary, after all, since the source of the
hibernation vs CONFIG_NUMA problem turned out to be the fact that
we didn't take the NUMA KVA remapping into account in the
hibernation code.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c280ea5e 08-Nov-2008 Ingo Molnar <mingo@elte.hu>

x86: fix documentation typo in arch/x86/Kconfig

Impact: documentation update

Chris Snook pointed out that it's Core i7, not Core 7i.

Reported-by: Chris Snook <csnook@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 6cd10f8d 09-Nov-2008 James Bottomley <James.Bottomley@HansenPartnership.com>

x86, voyager: fix smp generic helper voyager breakage

Impact: build/boot fix for x86/Voyager

This change:

| commit 3d4422332711ef48ef0f132f1fcbfcbd56c7f3d1
| Author: Jens Axboe <jens.axboe@oracle.com>
| Date: Thu Jun 26 11:21:34 2008 +0200
|
| Add generic helpers for arch IPI function calls

didn't wire up the voyager smp call function correctly, so do that
here. Also make CONFIG_USE_GENERIC_SMP_HELPERS a def_bool y again,
since we now use the generic helpers for every x86 architecture.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <Jens.Axboe@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f1c4be5e 11-Nov-2008 Ingo Molnar <mingo@elte.hu>

tracing, x86: clean up FUNCTION_RET_TRACER Kconfig

Impact: cleanup

move FUNCTION_RET_TRACER to the X86 select section, where we have all the
other options.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# caf4b323 10-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing, x86: add low level support for ftrace return tracing

Impact: add infrastructure for function-return tracing

Add low level support for ftrace return tracing.

This plug-in stores return addresses on the thread_info structure of
the current task.

The index of the current return address is initialized when the task
is the first one (init) and when a process forks (the child). It is
not needed when a task does a sys_execve because after this syscall,
it still needs to return on the kernel functions it called.

Note that the code of return_to_handler has been suggested by Steven
Rostedt as almost all of the ideas of improvements in this V3.

For purpose of security, arch/x86/kernel/process_32.c is not traced
because __switch_to() changes the current task during its execution.
That could cause inconsistency in the stored return address of this
function even if I didn't have any crash after testing with tracing on
this function enabled.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ae1e9130 11-Nov-2008 Ingo Molnar <mingo@elte.hu>

sched: rename SCHED_NO_NO_OMIT_FRAME_POINTER => SCHED_OMIT_FRAME_POINTER

Impact: cleanup, change .config option name

We had this ugly config name for a long time for hysteric raisons.
Rename it to a saner name.

We still cannot get rid of it completely, until /proc/<pid>/stack
usage replaces WCHAN usage for good.

We'll be able to do that in the v2.6.29/v2.6.30 timeframe.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 4694516d 10-Nov-2008 Rafael J. Wysocki <rjw@rjwysocki.net>

x86: Make NUMA on 32-bit depend on BROKEN

While investigating the failure of hibernation on 32-bit x86 with
CONFIG_NUMA set, as described in this message
http://marc.info/?l=linux-kernel&m=122634118116226&w=4
I asked some people for help and I was told that it wasn't really
worth the effort, because CONFIG_NUMA was generally broken on 32-bit
x86 systems and it shouldn't be used in such configs. For this
reason, make CONFIG_NUMA depend on BROKEN instead of EXPERIMENTAL on
x86-32.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Pavel Machek <pavel@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a87d0914 06-Nov-2008 Ken Chen <kenchen@google.com>

x86, sched: enable wchan config menu item on 64-bit

Enable the wchan config menu item for now on x86-64 arch? This will
at least allow people to enable/disable frame pointers on scheduler
functions.

Signed-off-by: Ken Chen <kenchen@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fd51b2d7 04-Nov-2008 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

x86: update CONFIG_NUMA description

Impact: clarify/update CONFIG_NUMA text

CONFIG_NUMA description talk about a bit old thing.
So, following changes are better.

o CONFIG_NUMA is no longer EXPERIMENTAL

o Opteron is not the only processor of NUMA topology on x86_64 no longer,
but also Intel Core7i has it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# da85f865 05-Nov-2008 Bjorn Helgaas <bjorn.helgaas@hp.com>

x86: mention ACPI in top-level Kconfig menu

Impact: clarify menuconfig text

Mention ACPI in the top-level menu to give a clue as to where
it lives. This matches what ia64 does.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 60a7ecf4 05-Nov-2008 Steven Rostedt <srostedt@redhat.com>

ftrace: add quick function trace stop

Impact: quick start and stop of function tracer

This patch adds a way to disable the function tracer quickly without
the need to run kstop_machine. It adds a new variable called
function_trace_stop which will stop the calls to functions from mcount
when set. This is just an on/off switch and does not handle recursion
like preempt_disable().

It's main purpose is to help other tracers/debuggers start and stop tracing
fuctions without the need to call kstop_machine.

The config option HAVE_FUNCTION_TRACE_MCOUNT_TEST is added for archs
that implement the testing of the function_trace_stop in the mcount
arch dependent code. Otherwise, the test is done in the C code.

x86 is the only arch at the moment that supports this.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e5beae16 03-Nov-2008 Keith Packard <keithp@keithp.com>

io mapping: clean up #ifdefs

Impact: cleanup

clean up ifdefs: change #ifdef CONFIG_X86_32/64 to
CONFIG_HAVE_ATOMIC_IOMAP.

flip around the #ifdef sections to clean up the structure.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 35551053 31-Oct-2008 Gary Hade <garyhade@us.ibm.com>

x86: add memory hotremove config option

Impact: enable CONFIG_MEMORY_HOTREMOVE feature on x86. (default-off)

Memory hotremove functionality can currently be configured into
the ia64, powerpc, and s390 kernels.

This patch makes it possible to configure the memory hotremove
functionality into the x86 kernel as well.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b3572e36 30-Oct-2008 James Bottomley <James.Bottomley@HansenPartnership.com>

x86/voyager: fix compile breakage caused by dc1e35c6e95e8923cf1d3510438b63c600fee1e2

Impact: build fix on x86/Voyager

Given commits like this:

| Author: Suresh Siddha <suresh.b.siddha@intel.com>
| Date: Tue Jul 29 10:29:19 2008 -0700
|
| x86, xsave: enable xsave/xrstor on cpus with xsave support

Which deliberately expose boot cpu dependence to pieces of the system,
I think it's time to explicitly have a variable for it to prevent this
continual misassumption that the boot CPU is zero.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 7a527688 30-Oct-2008 Jan Beulich <jbeulich@novell.com>

x86: simplify X86_MPPARSE config option

Impact: cleanup

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9e899816 21-Oct-2008 Nick Piggin <npiggin@suse.de>

x86, mm: enable GBPAGES option by default

DIRECT_GBPAGES was under DEBUG_KERNEL && EXPERIMENTAL and disabled by default.
Turn it on by default and put it under EMBEDDED.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 27471fdb 09-Oct-2008 Andy Henroid <andrew.d.henroid@intel.com>

i7300_idle driver v1.55

The Intel 7300 Memory Controller supports dynamic throttling of memory which can
be used to save power when system is idle. This driver does the memory
throttling when all CPUs are idle on such a system.

Refer to "Intel 7300 Memory Controller Hub (MCH)" datasheet
for the config space description.

Signed-off-by: Andy Henroid <andrew.d.henroid@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>


# 606576ce 06-Oct-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: rename FTRACE to FUNCTION_TRACER

Due to confusion between the ftrace infrastructure and the gcc profiling
tracer "ftrace", this patch renames the config options from FTRACE to
FUNCTION_TRACER. The other two names that are offspring from FTRACE
DYNAMIC_FTRACE and FTRACE_MCOUNT_RECORD will stay the same.

This patch was generated mostly by script, and partially by hand.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# dc52ddc0 18-Oct-2008 Matt Helsley <matthltc@us.ibm.com>

container freezer: implement freezer cgroup subsystem

This patch implements a new freezer subsystem in the control groups
framework. It provides a way to stop and resume execution of all tasks in
a cgroup by writing in the cgroup filesystem.

The freezer subsystem in the container filesystem defines a file named
freezer.state. Writing "FROZEN" to the state file will freeze all tasks
in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in
the cgroup. Reading will return the current state.

* Examples of usage :

# mkdir /containers/freezer
# mount -t cgroup -ofreezer freezer /containers
# mkdir /containers/0
# echo $some_pid > /containers/0/tasks

to get status of the freezer subsystem :

# cat /containers/0/freezer.state
RUNNING

to freeze all tasks in the container :

# echo FROZEN > /containers/0/freezer.state
# cat /containers/0/freezer.state
FREEZING
# cat /containers/0/freezer.state
FROZEN

to unfreeze all tasks in the container :

# echo RUNNING > /containers/0/freezer.state
# cat /containers/0/freezer.state
RUNNING

This is the basic mechanism which should do the right thing for user space
task in a simple scenario.

It's important to note that freezing can be incomplete. In that case we
return EBUSY. This means that some tasks in the cgroup are busy doing
something that prevents us from completely freezing the cgroup at this
time. After EBUSY, the cgroup will remain partially frozen -- reflected
by freezer.state reporting "FREEZING" when read. The state will remain
"FREEZING" until one of these things happens:

1) Userspace cancels the freezing operation by writing "RUNNING" to
the freezer.state file
2) Userspace retries the freezing operation by writing "FROZEN" to
the freezer.state file (writing "FREEZING" is not legal
and returns EIO)
3) The tasks that blocked the cgroup from entering the "FROZEN"
state disappear from the cgroup's set of tasks.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: export thaw_process]
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Tested-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 89cedfef 16-Oct-2008 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>

cpuidle: upon BIOS bug, default to default_idle rather than polling

http://bugzilla.kernel.org/show_bug.cgi?id=11345

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 9ba16087 15-Oct-2008 Jan Beulich <jbeulich@novell.com>

Kconfig: eliminate "def_bool n" constructs

Using "def_bool n" is pointless, simply using bool here appears more
appropriate.

Further, retaining such options that don't have a prompt and aren't
selected by anything seems also at least questionable.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d6c88a50 15-Oct-2008 Thomas Gleixner <tglx@linutronix.de>

genirq: revert dynarray

Revert the dynarray changes. They need more thought and polishing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 3235e936 15-Oct-2008 Thomas Gleixner <tglx@linutronix.de>

x86: remove sparse irq from Kconfig

This code is not ready yet.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 8f09cd20 19-Aug-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: make HAVE_SPARSE_IRQ support selectable

Ingo said sparse_irq is some intrusive. need to make it selectable

to make it simple, remove irq_desc as parameter in some functions.
(ack, eoi, set_affinity).
may need to make member if irq_chip to take irq_desc, or struct irq later.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 199751d7 19-Aug-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: make 32 bit to use sparse_irq

but actually irq still needs to be less than NR_IRQS, because
interrupt[NR_IRQS] in entry.S.

need to enable per_cpu vector...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 8b8e8c1b 19-Aug-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: remove irqbalance in kernel for 32 bit

This has been deprecated for years, the user space irqbalanced utility
works better with numa, has configurable policies, etc...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmai.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 08678b08 19-Aug-2008 Yinghai Lu <yhlu.kernel@gmail.com>

generic: sparse irqs: use irq_desc() together with dyn_array, instead of irq_desc[]

add CONFIG_HAVE_SPARSE_IRQ to for use condensed array.
Get rid of irq_desc[] array assumptions.

Preallocate 32 irq_desc, and irq_desc() will try to get more.

( No change in functionality is expected anywhere, except the odd build
failure where we missed a code site or where a crossing commit itroduces
new irq_desc[] usage. )

v2: according to Eric, change get_irq_desc() to irq_desc()

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 6da55c3e 19-Aug-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: enable dyn_array support

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e4b2b886 14-Aug-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: enable using mcount recording on x86

Enable the use of the __mcount_loc infrastructure on x86_64 and i386.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b8073050 05-Oct-2008 Ingo Molnar <mingo@elte.hu>

x86: remove additional_cpus configurability

additional_cpus=<x> parameter is dangerous and broken: for example
if we boot additional_cpus=-2 on a stock dual-core system it will
crash the box on bootup.

So reduce the maze of code a bit by removingthe user-configurability
angle.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 7f2f49a5 02-Oct-2008 Chuck Ebbert <cebbert@redhat.com>

x86: allow number of additional hotplug CPUs to be set at compile time, V2

x86: allow number of additional hotplug CPUs to be set at compile time, V2

The default number of additional CPU IDs for hotplugging is determined
by asking ACPI or mptables how many "disabled" CPUs there are in the
system, but many systems get this wrong so that e.g. a uniprocessor
machine gets an extra CPU allocated and never switches to single CPU
mode.

And sometimes CPU hotplugging is enabled only for suspend/hibernate
anyway, so the additional CPU IDs are not wanted. Allow the number
to be set to zero at compile time.

Also, force the number of extra CPUs to zero if hotplugging is disabled
which allows removing some conditional code.

Tested on uniprocessor x86_64 that ACPI claims has a disabled processor,
with CPU hotplugging configured.

("After" has the number of additional CPUs set to 0)
Before: NR_CPUS: 512, nr_cpu_ids: 2, nr_node_ids 1
After: NR_CPUS: 512, nr_cpu_ids: 1, nr_node_ids 1

[Changed the name of the option and the prompt according to Ingo's
suggestion.]

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2ffb3501 30-Sep-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: change MTRR_SANITIZER to def_bool y

This option has been added in v2.6.26 as a default-disabled
feature and went through several revisions since then.

The feature fixes a wide range of MTRR setup problems that BIOSes
leave us with: slow system, slow Xorg, slow system when adding lots
of RAM, etc., so we want to enable it by default for v2.6.28.

See:

[Bug 10508] Upgrade to 4GB of RAM messes up MTRRs
http://bugzilla.kernel.org/show_bug.cgi?id=10508

and the test results in:

http://lkml.org/lkml/2008/9/29/273

1. hpa
reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1
reg01: base=0x13c000000 (5056MB), size= 64MB: uncachable, count=1
reg02: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg03: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
reg04: base=0xbf700000 (3063MB), size= 1MB: uncachable, count=1
reg05: base=0xbf800000 (3064MB), size= 8MB: uncachable, count=1

will get
Found optimal setting for mtrr clean up
gran_size: 1M chunk_size: 128M num_reg: 6 lose RAM: 0M
range0: 0000000000000000 - 00000000c0000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
hole: 00000000bf700000 - 00000000c0000000
Setting variable MTRR 2, base: 3063MB, range: 1MB, type UC
Setting variable MTRR 3, base: 3064MB, range: 8MB, type UC
range0: 0000000100000000 - 0000000140000000
Setting variable MTRR 4, base: 4096MB, range: 1024MB, type WB
hole: 000000013c000000 - 0000000140000000
Setting variable MTRR 5, base: 5056MB, range: 64MB, type UC

2. Dylan Taft
reg00: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg01: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg02: base=0x120000000 (4608MB), size= 256MB: write-back, count=1
reg03: base=0xd0000000 (3328MB), size= 256MB: uncachable, count=1
reg04: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg05: base=0xc7e00000 (3198MB), size= 2MB: uncachable, count=1
reg06: base=0xc8000000 (3200MB), size= 128MB: uncachable, count=1

will get
Found optimal setting for mtrr clean up
gran_size: 1M chunk_size: 4M num_reg: 6 lose RAM: 0M
range0: 0000000000000000 - 00000000c8000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
Setting variable MTRR 2, base: 3072MB, range: 128MB, type WB
hole: 00000000c7e00000 - 00000000c8000000
Setting variable MTRR 3, base: 3198MB, range: 2MB, type UC
rangeX: 0000000100000000 - 0000000130000000
Setting variable MTRR 4, base: 4096MB, range: 512MB, type WB
Setting variable MTRR 5, base: 4608MB, range: 256MB, type WB

3. Gabriel
reg00: base=0xd0000000 (3328MB), size= 256MB: uncachable, count=1
reg01: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg02: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg03: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg04: base=0x120000000 (4608MB), size= 128MB: write-back, count=1
reg05: base=0x128000000 (4736MB), size= 64MB: write-back, count=1
reg06: base=0xcf600000 (3318MB), size= 2MB: uncachable, count=1

will get
Found optimal setting for mtrr clean up
gran_size: 1M chunk_size: 16M num_reg: 7 lose RAM: 0M
range0: 0000000000000000 - 00000000d0000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
Setting variable MTRR 2, base: 3072MB, range: 256MB, type WB
hole: 00000000cf600000 - 00000000cf800000
Setting variable MTRR 3, base: 3318MB, range: 2MB, type UC
rangeX: 0000000100000000 - 000000012c000000
Setting variable MTRR 4, base: 4096MB, range: 512MB, type WB
Setting variable MTRR 5, base: 4608MB, range: 128MB, type WB
Setting variable MTRR 6, base: 4736MB, range: 64MB, type WB

4. Mika Fischer
reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1
reg01: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg02: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
reg03: base=0xbf700000 (3063MB), size= 1MB: uncachable, count=1
reg04: base=0xbf800000 (3064MB), size= 8MB: uncachable, count=1

will get
Found optimal setting for mtrr clean up
gran_size: 1M chunk_size: 16M num_reg: 5 lose RAM: 0M
range0: 0000000000000000 - 00000000c0000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
hole: 00000000bf700000 - 00000000c0000000
Setting variable MTRR 2, base: 3063MB, range: 1MB, type UC
Setting variable MTRR 3, base: 3064MB, range: 8MB, type UC
rangeX: 0000000100000000 - 0000000140000000
Setting variable MTRR 4, base: 4096MB, range: 1024MB, type WB

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 18dbc916 22-Sep-2008 Dmitry Adamushko <dmitry.adamushko@gmail.com>

x86: moved microcode.c to microcode_intel.c

Combine both generic and arch-specific parts of microcode into a
single module (arch-specific parts are config-dependent).

Also while we are at it, move arch-specific parts from microcode.h
into their respective arch-specific .c files.

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: "Peter Oruba" <peter.oruba@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a80dc3e0 11-Sep-2008 Joerg Roedel <joerg.roedel@amd.com>

AMD IOMMU: add MSI interrupt support

The AMD IOMMU can generate interrupts for various reasons. This patch
adds the basic interrupt enabling infrastructure to the driver.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fc381519 16-Sep-2008 Ingo Molnar <mingo@elte.hu>

x86: add X86_RESERVE_LOW_64K

This bugzilla:

http://bugzilla.kernel.org/show_bug.cgi?id=11237

Documents a wide range of systems where the BIOS utilizes the first
64K of physical memory during suspend/resume and other hardware events.

Currently we reserve this memory on all AMI and Phoenix BIOS systems.
Life is too short to hunt subtle memory corruption problems like this,
so we try to be robust by default.

Still, allow this to be overriden: allow users who want that first 64K
of memory to be available to the kernel disable the quirk, via
CONFIG_X86_RESERVE_LOW_64K=n.

Also, allow the early reservation to overlap with other
early reservations.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 8308c54d 11-Sep-2008 Jeremy Fitzhardinge <jeremy@goop.org>

generic: redefine resource_size_t as phys_addr_t

There's no good reason why a resource_size_t shouldn't just be a
physical address, so simply redefine it in terms of phys_addr_t.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 600715dc 11-Sep-2008 Jeremy Fitzhardinge <jeremy@goop.org>

generic: add phys_addr_t for holding physical addresses

Add a kernel-wide "phys_addr_t" which is guaranteed to be able to hold
any physical address. By default it equals the word size of the
architecture, but a 32-bit architecture can set ARCH_PHYS_ADDR_T_64BIT
if it needs a 64-bit phys_addr_t.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b8992195 14-Sep-2008 Alexey Dobriyan <adobriyan@gmail.com>

x86: simpler SYSVIPC_COMPAT definition

X86_64 part is entirely redundant.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9c0bbee8 09-Sep-2008 Alexey Dobriyan <adobriyan@gmail.com>

seccomp: drop now bogus dependency on PROC_FS

seccomp is prctl(2)-driven now.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c885df50 07-Sep-2008 Jeremy Fitzhardinge <jeremy@goop.org>

x86: default corruption check to off, but put parameter default in Kconfig

Default the low memory corruption check to off, but make the default setting of
the memory_corruption_check kernel parameter a config parameter.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9f077871 07-Sep-2008 Jeremy Fitzhardinge <jeremy@goop.org>

x86: clean up memory corruption check and add more kernel parameters

The corruption check is enabled in Kconfig by default, but disabled at runtime.

This patch adds several kernel parameters to control the corruption
check's behaviour; these are documented in kernel-parameters.txt.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 5394f80f 07-Sep-2008 Jeremy Fitzhardinge <jeremy@goop.org>

x86: check for and defend against BIOS memory corruption

Some BIOSes have been observed to corrupt memory in the low 64k. This
change:
- Reserves all memory which does not have to be in that area, to
prevent it from being used as general memory by the kernel. Things
like the SMP trampoline are still in the memory, however.
- Clears the reserved memory so we can observe changes to it.
- Adds a function check_for_bios_corruption() which checks and reports on
memory becoming unexpectedly non-zero. Currently it's called in the
x86 fault handler, and the powermanagement debug output.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e17c6d56 16-Jun-2008 David Woodhouse <dwmw2@infradead.org>

Introduce HAVE_AOUT symbol to remove hard-coded arch list for BINFMT_AOUT

HAVE_AOUT doesn't quite do the same thing as the recently removed
ARCH_SUPPORTS_AOUT config option. That was set even on platforms where
binfmt_aout isn't supported, although it's not entirely clear why.

So it's best just to introduce a new symbol, handled consistently with
other similar HAVE_xxx symbols; with a simple 'select' in the arch Kconfig.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>


# 6b213e1b 15-Jun-2008 David Woodhouse <dwmw2@infradead.org>

Remove redundant CONFIG_ARCH_SUPPORTS_AOUT

We don't need this any more; arguably we never really did.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>


# d25e26b6 25-Aug-2008 Linus Torvalds <torvalds@linux-foundation.org>

[x86] Clean up MAXSMP Kconfig, and limit NR_CPUS to 512

This fixes a regression that was indirectly caused by commit
1184dc2ffe2c8fb9afb766d870850f2c3165ef25 ("x86: modify Kconfig to allow
up to 4096 cpus").

Allowing 4k CPU's is not practical at this time, because we still have a
number of places that have several 'cpumask_t's on the stack, and a
4k-bit cpumask is 512 bytes of stack-space for each such variable. This
literally caused functions like 'smp_call_function_mask' to have a 2.5kB
stack frame, and several functions to have 2kB stackframes.

With an 8kB stack total, smashing the stack was simply much too likely.
At least bugzilla entry

http://bugzilla.kernel.org/show_bug.cgi?id=11342

was due to this.

The earlier commit to not inline load_module() into sys_init_module()
fixed the particular symptoms of this that Alan Brunelle saw in that
bugzilla entry, but the huge stack waste by cpumask_t's was the more
direct cause.

Some day we'll have allocation helpers that allocate large CPU masks
dynamically, but in the meantime we simply cannot allow cpumasks this
large.

Cc: Alan D. Brunelle <Alan.Brunelle@hp.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 516cbf37 12-Aug-2008 Tim Bird <tim.bird@am.sony.com>

x86, bootup: add built-in kernel command line for x86 (v2)

Allow x86 to support a built-in kernel command line. The built-in
command line can override the one provided by the boot loader, for
those cases where the boot loader is broken or it is difficult
to change the command line in the the boot loader.

H. Peter Anvin wrote:
> Ingo Molnar wrote:
>> Best would be to make it really apparent in the code that nothing
>> changes if this config option is not set. Preferably there should be
>> no extra code at all in that case.
>>
>
> I would like to see this:
[...Nested ifdefs...]

OK. This version changes absolutely nothing if CONFIG_CMDLINE_BOOL is not
set (the default). Also, no space is appended even when CONFIG_CMDLINE_BOOL
is set, but the builtin string is empty. This is less sloppy all the way
around, IMHO.

Note that I use the same option names as on other arches for
this feature.

[ mingo@elte.hu: build fix ]

Signed-off-by: Tim Bird <tim.bird@am.sony.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 04b69447 14-Aug-2008 Pavel Machek <pavel@suse.cz>

arch/x86/Kconfig: clean up, experimental adjustement

Adjust experimental tags in Kconfig, update config to notice that
i386/x86_64 is now single architecture.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 912985dc 12-Aug-2008 Rusty Russell <rusty@rustcorp.com.au>

mm: Make generic weak get_user_pages_fast and EXPORT_GPL it

Out of line get_user_pages_fast fallback implementation, make it a weak
symbol, get rid of CONFIG_HAVE_GET_USER_PAGES_FAST.

Export the symbol to modules so lguest can use it.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 99809963 06-Aug-2008 Jeff Chua <jeff.chua.linux@gmail.com>

x86: make sparsemem more available

With CONFIG_X86_PC, I can set CONFIG_SPARSEMEM=y.

With CONFIG_X86_GENERICARCH, CONFIG_SPARSEMEM depends on CONFIG_NUMA.

I'm using the patch below to enable sparsemem instead of flatmem.
System booted and is running.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 7c13e6a3 11-Aug-2008 Dimitri Sivanich <sivanich@sgi.com>

x86: remove EXPERIMENTAL restriction from CONFIG_HOTPLUG_CPU

This removes the EXPERIMENTAL restriction from CONFIG_HOTPLUG_CPU
on the x86 architecture.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 80cc9f10 28-Jul-2008 Peter Oruba <peter.oruba@amd.com>

x86: AMD microcode patch loading support

This patch introduces microcode patch loading for AMD
processors. It is based on previous corresponding work
for Intel processors.

It hooks into the general patch loading module. Main
difference is that a container file format is used to hold
all patch data for multiple processors as well as an
equivalent CPU table, which comes seperately, as opposed
to Intel's microcode patching solution.

Kconfig and Makefile have been changed provice config
and build option for new source file.

Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 8d86f390 28-Jul-2008 Peter Oruba <peter.oruba@amd.com>

x86: major refactoring

Refactored code by introducing a two-module solution.

There is one general module in which vendor specific modules can hook into.
However, that is exclusive, there is only one vendor specific module
allowed at a time. A CPU vendor check makes sure only the correct
module for the underlying system gets called.

Functinally in terms of patch loading itself there are no changes. This
refactoring provides a basis for future implementations of other vendors'
patch loaders.

Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 7225e751 26-Jul-2008 Randy Dunlap <randy.dunlap@oracle.com>

documentation: move mtrr.txt to Doc/x86/ subdir

Move mtrr.txt to the Documentation/x86/ subdirectory.
Add 00-INDEX to the Documentation/x86/ subdirectory.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 99bbc4b1 20-Apr-2008 Roland McGrath <roland@redhat.com>

x86: tracehook: CONFIG_HAVE_ARCH_TRACEHOOK

The x86 arch code has all the prerequisites, so set HAVE_ARCH_TRACEHOOK.

Signed-off-by: Roland McGrath <roland@redhat.com>


# 8174c430 25-Jul-2008 Nick Piggin <npiggin@suse.de>

x86: lockless get_user_pages_fast()

Implement get_user_pages_fast without locking in the fastpath on x86.

Do an optimistic lockless pagetable walk, without taking mmap_sem or any
page table locks or even mmap_sem. Page table existence is guaranteed by
turning interrupts off (combined with the fact that we're always looking
up the current mm, means we can do the lockless page table walk within the
constraints of the TLB shootdown design). Basically we can do this
lockless pagetable walk in a similar manner to the way the CPU's pagetable
walker does not have to take any locks to find present ptes.

This patch (combined with the subsequent ones to convert direct IO to use
it) was found to give about 10% performance improvement on a 2 socket 8
core Intel Xeon system running an OLTP workload on DB2 v9.5

"To test the effects of the patch, an OLTP workload was run on an IBM
x3850 M2 server with 2 processors (quad-core Intel Xeon processors at
2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel. Comparing
runs with and without the patch resulted in an overall performance
benefit of ~9.8%. Correspondingly, oprofiles showed that samples from
__up_read and __down_read routines that is seen during thread contention
for system resources was reduced from 2.8% down to .05%. Monitoring the
/proc/vmstat output from the patched run showed that the counter for
fast_gup contained a very high number while the fast_gup_slow value was
zero."

(fast_gup is the old name for get_user_pages_fast, fast_gup_slow is a
counter we had for the number of times the slowpath was invoked).

The main reason for the improvement is that DB2 has multiple threads each
issuing direct-IO. Direct-IO uses get_user_pages, and thus the threads
contend the mmap_sem cacheline, and can also contend on page table locks.

I would anticipate larger performance gains on larger systems, however I
think DB2 uses an adaptive mix of threads and processes, so it could be
that thread contention remains pretty constant as machine size increases.
In which case, we stuck with "only" a 10% gain.

The downside of using get_user_pages_fast is that if there is not a pte
with the correct permissions for the access, we end up falling back to
get_user_pages and so the get_user_pages_fast is a bit of extra work.
However this should not be the common case in most performance critical
code.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: Kconfig fix]
[akpm@linux-foundation.org: Makefile fix/cleanup]
[akpm@linux-foundation.org: warning fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 89081d17 25-Jul-2008 Huang Ying <ying.huang@intel.com>

kexec jump: save/restore device state

This patch implements devices state save/restore before after kexec.

This patch together with features in kexec_jump patch can be used for
following:

- A simple hibernation implementation without ACPI support. You can kexec a
hibernating kernel, save the memory image of original system and shutdown
the system. When resuming, you restore the memory image of original system
via ordinary kexec load then jump back.

- Kernel/system debug through making system snapshot. You can make system
snapshot, jump back, do some thing and make another system snapshot.

- Cooperative multi-kernel/system. With kexec jump, you can switch between
several kernels/systems quickly without boot process except the first time.
This appears like swap a whole kernel/system out/in.

- A general method to call program in physical mode (paging turning
off). This can be used to invoke BIOS code under Linux.

The following user-space tools can be used with kexec jump:

- kexec-tools needs to be patched to support kexec jump. The patches
and the precompiled kexec can be download from the following URL:
source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

- makedumpfile with patches are used as memory image saving tool, it
can exclude free pages from original kernel memory image file. The
patches and the precompiled makedumpfile can be download from the
following URL:
source: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10

- An initramfs image can be used as the root file system of kexeced
kernel. An initramfs image built with "BuildRoot" can be downloaded
from the following URL:
initramfs image: http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz
All user space tools above are included in the initramfs image.

Usage example of simple hibernation:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_RELOCATABLE=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PM=y
CONFIG_HIBERNATION=y
CONFIG_KEXEC_JUMP=y

2. Build an initramfs image contains kexec-tool and makedumpfile, or
download the pre-built initramfs image, called rootfs.gz in
following text.

3. Prepare a partition to save memory image of original kernel, called
hibernating partition in following text.

4. Boot kernel compiled in step 1 (kernel A).

5. In the kernel A, load kernel compiled in step 1 (kernel B) with
/sbin/kexec. The shell command line can be as follow:

/sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
--mem-max=0xffffff --initrd=rootfs.gz

6. Boot the kernel B with following shell command line:

/sbin/kexec -e

7. The kernel B will boot as normal kexec. In kernel B the memory
image of kernel A can be saved into hibernating partition as
follow:

jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
echo $jump_back_entry > kexec_jump_back_entry
cp /proc/vmcore dump.elf

Then you can shutdown the machine as normal.

8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
root file system.

9. In kernel C, load the memory image of kernel A as follow:

/sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf

10. Jump back to the kernel A as follow:

/sbin/kexec -e

Then, kernel A is resumed.

Implementation point:

To support jumping between two kernels, before jumping to (executing)
the new kernel and jumping back to the original kernel, the devices
are put into quiescent state, and the state of devices and CPU is
saved. After jumping back from kexeced kernel and jumping to the new
kernel, the state of devices and CPU are restored accordingly. The
devices/CPU state save/restore code of software suspend is called to
implement corresponding function.

Known issues:

- Because the segment number supported by sys_kexec_load is limited,
hibernation image with many segments may not be load. This is
planned to be eliminated by adding a new flag to sys_kexec_load to
make a image can be loaded with multiple sys_kexec_load invoking.

Now, only the i386 architecture is supported.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3ab83521 25-Jul-2008 Huang Ying <ying.huang@intel.com>

kexec jump

This patch provides an enhancement to kexec/kdump. It implements the
following features:

- Backup/restore memory used by the original kernel before/after
kexec.

- Save/restore CPU state before/after kexec.

The features of this patch can be used as a general method to call program in
physical mode (paging turning off). This can be used to call BIOS code under
Linux.

kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:

source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

Usage example of calling some physical mode code and return:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_KEXEC=y
CONFIG_PM=y
CONFIG_KEXEC_JUMP=y

2. Build patched kexec-tool or download the pre-built one.

3. Build some physical mode executable named such as "phy_mode"

4. Boot kernel compiled in step 1.

5. Load physical mode executable with /sbin/kexec. The shell command
line can be as follow:

/sbin/kexec --load-preserve-context --args-none phy_mode

6. Call physical mode executable with following shell command line:

/sbin/kexec -e

Implementation point:

To support jumping without reserving memory. One shadow backup page (source
page) is allocated for each page used by kexeced code image (destination
page). When do kexec_load, the image of kexeced code is loaded into source
pages, and before executing, the destination pages and the source pages are
swapped, so the contents of destination pages are backupped. Before jumping
to the kexeced code image and after jumping back to the original kernel, the
destination pages and the source pages are swapped too.

C ABI (calling convention) is used as communication protocol between
kernel and called code.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.

Now, only the i386 architecture is supported.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1f972768 26-Jul-2008 Ingo Molnar <mingo@elte.hu>

x86, RDC321x: add to mach-default

first step to add RDC321x support to the default PC architecture.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 7444a72e 25-Jul-2008 Michael Buesch <mb@bu3sch.de>

gpiolib: allow user-selection

This patch adds functionality to the gpio-lib subsystem to make it
possible to enable the gpio-lib code even if the architecture code didn't
request to get it built in.

The archtitecture code does still need to implement the gpiolib accessor
functions in its asm/gpio.h file. This patch adds the implementations for
x86 and PPC.

With these changes it is possible to run generic GPIO expansion cards on
every architecture that implements the trivial wrapper functions. Support
for more architectures can easily be added.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: David Brownell <david-b@pacbell.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Samuel Ortiz <sameo@openedhand.com>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 58340a07 25-Jul-2008 Johannes Berg <johannes@sipsolutions.net>

introduce HAVE_EFFICIENT_UNALIGNED_ACCESS Kconfig symbol

In many cases, especially in networking, it can be beneficial to know at
compile time whether the architecture can do unaligned accesses efficiently.
This patch introduces a new Kconfig symbol

HAVE_EFFICIENT_UNALIGNED_ACCESS

for that purpose and adds it to the powerpc and x86 architectures. Also add
some documentation about alignment and networking, and especially one intended
use of this symbol.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Ingo Molnar <mingo@elte.hu> [x86 architecture part]
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 28b2ee20 23-Jul-2008 Rik van Riel <riel@redhat.com>

access_process_vm device memory infrastructure

In order to be able to debug things like the X server and programs using
the PPC Cell SPUs, the debugger needs to be able to access device memory
through ptrace and /proc/pid/mem.

This patch:

Add the generic_access_phys access function and put the hooks in place
to allow access_process_vm to access device or PPC Cell SPU memory.

[riel@redhat.com: Add documentation for the vm_ops->access function]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Benjamin Herrensmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 41b9eb26 15-Jul-2008 Stefan Assmann <sassmann@suse.de>

x86, pci: introduce config option for pci reroute quirks (was: [PATCH 0/3] Boot IRQ quirks for Broadcom and AMD/ATI)

This is against linux-2.6-tip, branch pci-ioapic-boot-irq-quirks.

From: Stefan Assmann <sassmann@suse.de>
Subject: Introduce config option for pci reroute quirks

The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS is introduced to
enable (or disable) the redirection of the interrupt handler to the boot
interrupt line by default. Depending on the existence of interrupt
masking / threaded interrupt handling in the kernel (vanilla, rt, ...)
and the maturity of the rerouting patch, users can enable or disable the
redirection by default.

This means that the reroute quirk can be applied to any kernel without
changing it.

Interrupt sharing could be increased if this option is enabled. However this
option is vital for threaded interrupt handling, as done by the RT kernel.
It should simplify the consolidation with the RT kernel.

The option can be overridden by either pci=ioapicreroute or
pci=noioapicreroute.

Signed-off-by: Stefan Assmann <sassmann@suse.de>
Signed-off-by: Olaf Dabrunz <od@suse.de>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jon Masters <jonathan@jonmasters.org>
Cc: Ihno Krumreich <ihno@suse.de>
Cc: Sven Dietrich <sdietrich@suse.de>
Cc: Daniel Gollub <dgollub@suse.de>
Cc: Felix Foerster <ffoerster@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# caadbdce 15-Jul-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: enable memory tester support on 32-bit

only supports memory below max_low_pfn.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# aba3728c 15-Jul-2008 Thomas Gleixner <tglx@linutronix.de>

x86: sanitize Kconfig

Set default n for MEMTEST and MTRR_SANITIZER and fix the help texts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 9fa8c481 10-Jul-2008 Suresh Siddha <suresh.b.siddha@intel.com>

x64, x2apic/intr-remap: introduce CONFIG_INTR_REMAP

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: akpm@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: andi@firstfloor.org
Cc: ebiederm@xmission.com
Cc: jbarnes@virtuousgeek.org
Cc: steiner@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 39415a44 10-Jul-2008 Ingo Molnar <mingo@elte.hu>

x86, VisWS: fix pci_direct_conf1 dependency

fix:

arch/x86/pci/built-in.o: In function `pci_subsys_init':
visws.c:(.init.text+0xfc5): undefined reference to `pci_direct_conf1'

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c47277d2 10-Jul-2008 Ingo Molnar <mingo@elte.hu>

x86, VisWS: build fix

fix:

arch/x86/kernel/built-in.o: In function `visws_early_detect':
: undefined reference to `mach_get_smp_config_quirk'
arch/x86/kernel/built-in.o: In function `visws_early_detect':
: undefined reference to `mach_find_smp_config_quirk'

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b6770c83 10-Jul-2008 Ingo Molnar <mingo@elte.hu>

x86, VisWS: do not allow VisWS for Voyager

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# efefa6f6 10-Jul-2008 Ingo Molnar <mingo@elte.hu>

x86, VisWS: turn into generic arch, clean up

remove VISWS Kconfig complications, now that it's supported by the generic
architecture.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1b84e1c8 10-Jul-2008 Ingo Molnar <mingo@elte.hu>

x86, VisWS: turn into generic arch, flip over VISWS to generic arch

this is the big move: flip over VISWS to generic arch support.

From this commit on CONFIG_X86_VISWS is just another (default-disabled)
option that turns on certain quirks - no other complications.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 5ab74722 10-Jul-2008 Ingo Molnar <mingo@elte.hu>

x86, VisWS: turn into generic arch, use generic mpparse code

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 31ac409a 10-Jul-2008 Ingo Molnar <mingo@elte.hu>

x86, VisWS: turn into generic arch, add early init quirks

add early init quirks for VisWS. This gradually turns the VISWS subarch
into a generic PC architecture.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a6784ad7 09-Jul-2008 Ingo Molnar <mingo@elte.hu>

x86: fix visws and vsmp build

these two sub-architectures want PCI to be default-on, not default-off.

Reported-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 18b743dc 09-Jul-2008 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

x86, AMD IOMMU: clean up Kconfig entry

AMD_IOMMU should depend on IOMMU_HELPER since they are the IOMMU
helper functions. SWIOTLB requires IOMMU_HELPER so declaring that
AMD_IOMMU depends on SWIOTLB properly fixes the problems.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 6247943d 06-Jul-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: remove acpi_srat config v2

use ACPI_NUMA directly

and move srat_32.c to mm/

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 698839fe 06-Jul-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: remove have_arch_parse_srat -v2

we already have the same srat handling interface for 32bit.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 97349135 24-Jun-2008 Jeremy Fitzhardinge <jeremy@goop.org>

x86/paravirt: add debugging for missing operations

Rather than just jumping to 0 when there's a missing operation, raise a BUG.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# d1b20afe 25-Jun-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: make x86_find_smp_config depends on 64 bit too

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 383bc5ce 26-Jun-2008 Bernhard Walle <bwalle@suse.de>

x86, crashdump, /proc/vmcore: remove CONFIG_EXPERIMENTAL from kdump

I would suggest to remove the "experimental" status from Kdump.
Kdump is now in the kernel since a long time and used by Enterprise
distributions. I don't think that "experimental" is true any more.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: vgoyal@redhat.com
Cc: kexec@lists.infradead.org
Cc: Bernhard Walle <bwalle@suse.de>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 23ca4bba 12-May-2008 Mike Travis <travis@sgi.com>

x86: cleanup early per cpu variables/accesses v4

* Introduce a new PER_CPU macro called "EARLY_PER_CPU". This is
used by some per_cpu variables that are initialized and accessed
before there are per_cpu areas allocated.

["Early" in respect to per_cpu variables is "earlier than the per_cpu
areas have been setup".]

This patchset adds these new macros:

DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)
EXPORT_EARLY_PER_CPU_SYMBOL(_name)
DECLARE_EARLY_PER_CPU(_type, _name)

early_per_cpu_ptr(_name)
early_per_cpu_map(_name, _idx)
early_per_cpu(_name, _cpu)

The DEFINE macro defines the per_cpu variable as well as the early
map and pointer. It also initializes the per_cpu variable and map
elements to "_initvalue". The early_* macros provide access to
the initial map (usually setup during system init) and the early
pointer. This pointer is initialized to point to the early map
but is then NULL'ed when the actual per_cpu areas are setup. After
that the per_cpu variable is the correct access to the variable.

The early_per_cpu() macro is not very efficient but does show how to
access the variable if you have a function that can be called both
"early" and "late". It tests the early ptr to be NULL, and if not
then it's still valid. Otherwise, the per_cpu variable is used
instead:

#define early_per_cpu(_name, _cpu) \
(early_per_cpu_ptr(_name) ? \
early_per_cpu_ptr(_name)[_cpu] : \
per_cpu(_name, _cpu))

A better method is to actually check the pointer manually. In the
case below, numa_set_node can be called both "early" and "late":

void __cpuinit numa_set_node(int cpu, int node)
{
int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);

if (cpu_to_node_map)
cpu_to_node_map[cpu] = node;
else
per_cpu(x86_cpu_to_node_map, cpu) = node;
}

* Add a flag "arch_provides_topology_pointers" that indicates pointers
to topology cpumask_t maps are available. Otherwise, use the function
returning the cpumask_t value. This is useful if cpumask_t set size
is very large to avoid copying data on to/off of the stack.

* The coverage of CONFIG_DEBUG_PER_CPU_MAPS has been increased while
the non-debug case has been optimized a bit.

* Remove an unreferenced compiler warning in drivers/base/topology.c

* Clean up #ifdef in setup.c

For inclusion into sched-devel/latest tree.

Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 1184dc2f 12-May-2008 Mike Travis <travis@sgi.com>

x86: modify Kconfig to allow up to 4096 cpus

* Increase the limit of NR_CPUS to 4096 and introduce a boolean
called "MAXSMP" which when set (e.g. "allyesconfig"), will set
NR_CPUS = 4096 and NODES_SHIFT = 9 (512).

* Changed max setting for NODES_SHIFT from 15 to 9 to accurately
reflect the real limit.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# bad48f4b 20-Jun-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: simplify x86_mpparse dependency check

"Maciej W. Rozycki" <macro@linux-mips.org> said:

> Given X86_64 selects X86_LOCAL_APIC I am not sure the redundancy seen
>above does not actually obscure the logic behind... I think:
>
> depends on X86_LOCAL_APIC && !X86_VISWS
>
>would be clearer and get the same.

Suggested-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a4caa18e 19-Jun-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: fix compiling when CONFIG_X86_MPPARSE is not set

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 6695c85b 19-Jun-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: let MPS support be selectable, v2

v2: seperate "fix for compiling when MPPARSE is not set" to another patch
make X86_MPPARSE to be selectable only when acpi is set and
X86_MPPARSE will be set if acpi is not set.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0699eae1 17-Jun-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: Kconfig cleanup with genericarch

we already have summit and etc depends on genericarch,
so use genericarch only.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 18d22200 03-Jul-2008 Joerg Roedel <joerg.roedel@amd.com>

x86, AMD IOMMU: more verbose Kconfig description text

This patch replaces the short description text for AMD IOMMU in Kconfig with a
more verbose one.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# efac4189 01-Jul-2008 Thomas Gleixner <tglx@linutronix.de>

x86: fix NODES_SHIFT Kconfig range

commit 4323838215184f5a2f081e0d17b8d60731b03164
x86: change size of node ids from u8 to s16

set the range for NODES_SHIFT to 1..15.

The possible range is 1..9

Fixes Bugzilla #10726

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 323ec001 29-Jun-2008 Dmitry Baryshkov <dbaryshkov@gmail.com>

x86: use generic per-device dma coherent allocator

Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 07c40e8a 27-Jun-2008 Ingo Molnar <mingo@elte.hu>

x86, AMD IOMMU: build fix #3

fix typo causing:

arch/x86/kernel/built-in.o: In function `__unmap_single':
amd_iommu.c:(.text+0x17771): undefined reference to `iommu_area_free'
arch/x86/kernel/built-in.o: In function `__map_single':
amd_iommu.c:(.text+0x1797a): undefined reference to `iommu_area_alloc'
amd_iommu.c:(.text+0x179a2): undefined reference to `iommu_area_alloc'

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 24d2ba0a 27-Jun-2008 Ingo Molnar <mingo@elte.hu>

x86, AMD IOMMU: build fix

fix:

arch/x86/kernel/amd_iommu_init.c:247: warning: 'struct acpi_table_header' declared inside parameter list
arch/x86/kernel/amd_iommu_init.c:247: warning: its scope is only this definition or declaration, which is probably not what you want
arch/x86/kernel/amd_iommu_init.c: In function 'find_last_devid_acpi':
arch/x86/kernel/amd_iommu_init.c:257: error: dereferencing pointer to incomplete type
arch/x86/kernel/amd_iommu_init.c:265: error: dereferencing pointer to incomplete type

the AMD IOMMU code depends on ACPI facilities.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2b188723 26-Jun-2008 Joerg Roedel <joerg.roedel@amd.com>

x86, AMD IOMMU: add Kconfig entry

This patch adds the Kconfig entry for the AMD IOMMU driver.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: Sebastian.Biemueller@amd.com
Cc: robert.richter@amd.com
Cc: joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3b16cf87 26-Jun-2008 Jens Axboe <jens.axboe@oracle.com>

x86: convert to generic helpers for IPI function calls

This converts x86, x86-64, and xen to use the new helpers for
smp_call_function() and friends, and adds support for
smp_call_function_single().

Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# f6e16d5a 03-Jun-2008 Gerd Hoffmann <kraxel@redhat.com>

x86: KVM guest: Use the paravirt clocksource structs and functions

This patch updates the kvm host code to use the pvclock structs
and functions, thereby making it compatible with Xen.

The patch also fixes an initialization bug: on SMP systems the
per-cpu has two different locations early at boot and after CPU
bringup. kvmclock must take that in account when registering the
physical address within the host.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>


# 7af192c9 03-Jun-2008 Gerd Hoffmann <kraxel@redhat.com>

x86: Add structs and functions for paravirt clocksource

This patch adds structs for the paravirt clocksource ABI
used by both xen and kvm (pvclock-abi.h).

It also adds some helper functions to read system time and
wall clock time from a paravirtual clocksource (pvclock.[ch]).
They are based on the xen code. They are enabled using
CONFIG_PARAVIRT_CLOCK.

Subsequent patches of this series will put the code in use.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>


# d49c4288 08-Jun-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: make generic arch support NUMAQ

... so it could fall back to normal numa and we'd reduce the impact of the
NUMAQ subarch.

NUMAQ depends on GENERICARCH
also decouple genericarch numa from acpi.
also make it fall back to bigsmp if apicid > 8.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2bdd1b03 05-Jun-2008 Andres Salomon <dilinger@queued.net>

PCI/x86: fix up PCI stuff so that PCI_GOANY supports OLPC

Previously, one would have to specifically choose CONFIG_OLPC and
CONFIG_PCI_GOOLPC in order to enable PCI_OLPC. That doesn't really work
for distro kernels, so this patch allows one to choose CONFIG_OLPC and
CONFIG_PCI_GOANY in order to build in OLPC support in a generic kernel (as
requested by Robert Millan).

This also moves GOOLPC before GOANY in the menuconfig list.

Finally, make pci_access_init return early if we detect OLPC hardware.
There's no need to continue probing stuff, and pci_pcbios_init
specifically trashes our settings (we didn't run into that before because
PCI_GOANY wasn't supported).

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 113c5413 14-Feb-2008 Ingo Molnar <mingo@elte.hu>

x86: unify stackprotector features

streamline the stackprotector features under a single option
and make the stronger feature the one accessible.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 72370f2a 13-Feb-2008 Ingo Molnar <mingo@elte.hu>

x86: if stackprotector is enabled, thn use stack-protector-all by default

also enable the rodata and nx tests.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 73531905 25-May-2008 Sam Ravnborg <sam@ravnborg.org>

Kconfig: introduce ARCH_DEFCONFIG to DEFCONFIG_LIST

init/Kconfig contains a list of configs that are searched
for if 'make *config' are used with no .config present.
Extend this list to look at the config identified by
ARCH_DEFCONFIG.

With this change we now try the defconfig targets last.

This fixes a regression reported
by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>


# 12031a62 02-May-2008 Yinghai Lu <yhlu.kernel.send@gmail.com>

x86: mtrr cleanup for converting continuous to discrete - auto detect v4

Loop through mtrr chunk_size and gran_size from 1M to 2G to find out
the optimal value so user does not need to add mtrr_chunk_size and
mtrr_gran_size to the kernel command line.

If optimal value is not found, print out all list to help select less
optimal value.

Add mtrr_spare_reg_nr= so user could set 2 instead of 1, if the card
need more entries.

v2: find the one with more spare entries
v3: fix hole_basek offset
v4: tight the compare between range and range_new
loop stop with 4g

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gabriel C <nix.or.die@googlemail.com>
Cc: Mika Fischer <mika.fischer@zoopnet.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f5098d62 29-Apr-2008 Yinghai Lu <yhlu.kernel.send@gmail.com>

x86: mtrr cleanup for converting continuous to discrete layout v8 - fix

v9: address format change requests by Ingo
more case handling in range_to_var_with_hole

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 95ffa243 29-Apr-2008 Yinghai Lu <yhlu.kernel.send@gmail.com>

x86: mtrr cleanup for converting continuous to discrete layout, v8

some BIOS like to use continus MTRR layout, and X driver can not add
WB entries for graphical cards when 4g or more RAM installed.

the patch will change MTRR to discrete.

mtrr_chunk_size= could be used to have smaller continuous block to hold holes.
default is 256m, could be set according to size of graphics card memory.

mtrr_gran_size= could be used to send smallest mtrr block to avoid run out of MTRRs

v2: fix -1 for UC checking
v3: default to disable, and need use enable_mtrr_cleanup to enable this feature
skip the var state change warning.
remove next_basek in range_to_mtrr()
v4: correct warning mask.
v5: CONFIG_MTRR_SANITIZER
v6: fix 1g, 2g, 512 aligment with extra hole
v7: gran_sizek to prevent running out of MTRRs.
v8: fix hole_basek caculation caused when removing next_basek
gran_sizek using when basek is 0.

need to apply
[PATCH] x86: fix trimming e820 with MTRR holes.
right after this one.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 1ac97018 19-May-2008 Ingo Molnar <mingo@elte.hu>

x86: untangle pci dependencies

make PCI-less subarches not build with PCI - instead of complicating
the PCI dependencies.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# c9fea78d 19-May-2008 Ingo Molnar <mingo@elte.hu>

x86: make NUMAQ depend on PCI

NUMAQ depends on the existence of PCI hardware, and requiring PCI
makes this subarch simpler and avoids build failures if PCI is
disabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 677aa9f7 16-May-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: add have dynamic ftrace config for archs

Now that ftrace is being ported to other architectures, it has become
apparent that DYNAMIC_FTRACE is dependent on whether or not that
architecture implements dynamic ftrace. FTRACE itself may be ported to
an architecture without porting dynamic ftrace.

This patch adds HAVE_DYNAMIC_FTRACE to allow architectures to port ftrace
without having to also port the dynamic aspect as well.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 16444a8a 12-May-2008 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

ftrace: add basic support for gcc profiler instrumentation

If CONFIG_FTRACE is selected and /proc/sys/kernel/ftrace_enabled is
set to a non-zero value the ftrace routine will be called everytime
we enter a kernel function that is not marked with the "notrace"
attribute.

The ftrace routine will then call a registered function if a function
happens to be registered.

[ This code has been highly hacked by Steven Rostedt and Ingo Molnar,
so don't blame Arnaldo for all of this ;-) ]

Update:
It is now possible to register more than one ftrace function.
If only one ftrace function is registered, that will be the
function that ftrace calls directly. If more than one function
is registered, then ftrace will call a function that will loop
through the functions to call.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# c3ed6429 16-May-2008 Mike Travis <travis@sgi.com>

x86: change maximum NR_CPUS to 4096 and MAX_NUMNODES to 512

* Change the range of NR_CPUS from 2-255 to 2-4096 and change the
range of MAX_NUMNODES (NODES_SHIFT) from 1-32768 to 1-512.

* Alter comment about how much each increment of NR_CPUS consumes.
(This was found by configuring for 256 cpus and then 512 cpus
and dividing the difference by 256.)

Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 03273184 18-Apr-2008 Yinghai Lu <yhlu.kernel.send@gmail.com>

x86_64: simplify the memtest parameter setting

use CONFIG_MEMTEST only. if it is set, will have memtest=0 (disabled)

need to have memtest=4 in command line to test more patterns.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 82fd8667 30-Apr-2008 Ingo Molnar <mingo@elte.hu>

x86: rdc: leds build/config fix

select NEW_LEDS for now until the Kconfig dependencies have been
fixed.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# ac44cc96 08-May-2008 Thomas Gleixner <tglx@linutronix.de>

x86: revert geode config dependency

commit e26a28d190304d910ee49b81cbfe6d9241f56e86
x86: olpc build fix

was a fix to a patch that was withdrawn/delayed and then erroneously
commited to x86.git. Revert it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# a5574cf6 05-May-2008 Ingo Molnar <mingo@elte.hu>

sched, x86: add HAVE_UNSTABLE_SCHED_CLOCK

add the HAVE_UNSTABLE_SCHED_CLOCK, for architectures to select.

the next change utilizes it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e26a28d1 03-May-2008 Thomas Gleixner <tglx@linutronix.de>

x86: olpc build fix

CONFIG_OLPC needs to depend on MGEODE_LX

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b9b39bfb 28-Apr-2008 Sam Ravnborg <sam@ravnborg.org>

x86: use defconfigs from x86/configs/*

Daniel Drake <dsd@gentoo.org> reported:

In 2.6.23, if you unpacked a kernel source tarball and then
ran "make menuconfig" you'd be presented with this message:
# using defaults found in arch/i386/defconfig

and the default options would be set.

The same thing in 2.6.24 does not give you any "using defaults" message, and
the default config options within menuconfig are rather blank (e.g. no PCI
support). You can work around this by explicitly running "make defconfig"
before menuconfig, but it would be nice to have the behaviour the way it was
for 2.6.23 (and the way it still is for other archs).

Fixed by adding a x86 specific defconfig list to Kconfig.

Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=10470
Tested-by: dsd@gentoo.org
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 3e8f7e35 28-Apr-2008 Ingo Molnar <mingo@elte.hu>

x86 VISWS: build fix

the 'reboot_force' flag is a notion that non-PC subarchitectures do
not have.

also, unify the X86_BIOS_REBOOT option between 32-bit and 64-bit
and get rid of a few unnecessary Kconfig and Makefile complications
that way.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 3ef0e1f8 29-Apr-2008 Andres Salomon <dilinger@queued.net>

x86: olpc: add One Laptop Per Child architecture support

This adds support for OLPC XO hardware. Open Firmware on XOs don't contain
the VSA, so it is necessary to emulate the PCI BARs in the kernel. This also
adds functionality for running EC commands, and a CONFIG_OLPC.

A number of OLPC drivers depend upon CONFIG_OLPC.

olpc_ec_timeout is a hack to work around Embedded Controller bugs.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: geode_has_vsa build fix]
[akpm@linux-foundation.org: olpc_register_battery_callback doesn't exist]
Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a8522509 29-Apr-2008 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

swiotlb: use iommu_is_span_boundary helper function

iommu_is_span_boundary in lib/iommu-helper.c was exported for PARISC IOMMUs
(commit 3715863aa142c4f4c5208f5f3e5e9bac06006d2f). SWIOTLB can use it instead
of the homegrown function.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7ae9392c 28-Apr-2008 Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

x86: configurable DMI scanning code

Turn CONFIG_DMI into a selectable option if EMBEDDED is defined, in
order to be able to remove the DMI table scanning code if it's not
needed, and then reduce the kernel code size.

With CONFIG_DMI (i.e before) :

text data bss dec hex filename
1076076 128656 98304 1303036 13e1fc vmlinux

Without CONFIG_DMI (i.e after) :

text data bss dec hex filename
1068092 126308 98304 1292704 13b9a0 vmlinux

Result:

text data bss dec hex filename
-7984 -2348 0 -10332 -285c vmlinux

The new option appears in "Processor type and features", only when
CONFIG_EMBEDDED is defined.

This patch is part of the Linux Tiny project, and is based on previous work
done by Matt Mackall <mpm@selenic.com>.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Anvin" <hpa@zytor.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1b27d05b 28-Apr-2008 Pekka Enberg <penberg@cs.helsinki.fi>

mm: move cache_line_size() to <linux/cache.h>

Not all architectures define cache_line_size() so as suggested by Andrew move
the private implementations in mm/slab.c and mm/slob.c to <linux/cache.h>.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0cf1bfd2 21-Feb-2008 Marcelo Tosatti <mtosatti@redhat.com>

x86: KVM guest: add basic paravirt support

Add basic KVM paravirt support. Avoid vm-exits on IO delays.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>


# 790c73f6 15-Feb-2008 Glauber de Oliveira Costa <gcosta@redhat.com>

x86: KVM guest: paravirtualized clocksource

This is the guest part of kvm clock implementation
It does not do tsc-only timing, as tsc can have deltas
between cpus, and it did not seem worthy to me to keep
adjusting them.

We do use it, however, for fine-grained adjustment.

Other than that, time comes from the host.

[randy dunlap: add missing include]
[randy dunlap: disallow on Voyager or Visual WS]

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>


# 19870def 25-Apr-2008 Alexander van Heukelum <heukelum@mailshack.com>

x86, bitops: select the generic bitmap search functions

Introduce GENERIC_FIND_FIRST_BIT and GENERIC_FIND_NEXT_BIT in
lib/Kconfig, defaulting to off. An arch that wants to use the
generic implementation now only has to use a select statement
to include them.

I added an always-y option (X86_CPU) to arch/x86/Kconfig.cpu
and used that to select the generic search functions. This
way ARCH=um SUBARCH=i386 automatically picks up the change
too, and arch/um/Kconfig.i386 can therefore be simplified a
bit. ARCH=um SUBARCH=x86_64 does things differently, but
still compiles fine. It seems that a "def_bool y" always
wins over a "def_bool n"?

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2aba6925 01-Apr-2008 Alexander van Heukelum <heukelum@mailshack.com>

x86: switch 64-bit to generic find_first_bit

Switch x86_64 to generic find_first_bit. The x86_64-specific
implementation is not removed.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 77b9bd9c 01-Apr-2008 Alexander van Heukelum <heukelum@mailshack.com>

x86: generic versions of find_first_(zero_)bit, convert i386

Generic versions of __find_first_bit and __find_first_zero_bit
are introduced as simplified versions of __find_next_bit and
__find_next_zero_bit. Their compilation and use are guarded by
a new config variable GENERIC_FIND_FIRST_BIT.

The generic versions of find_first_bit and find_first_zero_bit
are implemented in terms of the newly introduced __find_first_bit
and __find_first_zero_bit.

This patch does not remove the i386-specific implementation,
but it does switch i386 to use the generic functions by setting
GENERIC_FIND_FIRST_BIT=y for X86_32.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 6fd92b63 09-Mar-2008 Alexander van Heukelum <heukelum@mailshack.com>

x86: change x86 to use generic find_next_bit

The versions with inline assembly are in fact slower on the machines I
tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
crashing the benchmark.

Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
1...512, for each possible bitmap with one bit set, for each possible
offset: find the position of the first bit starting at offset. If you
follow ;). Times include setup of the bitmap and checking of the
results.

Athlon Xeon Opteron 32/64bit
x86-specific: 0m3.692s 0m2.820s 0m3.196s / 0m2.480s
generic: 0m2.622s 0m1.662s 0m2.100s / 0m1.572s

If the bitmap size is not a multiple of BITS_PER_LONG, and no set
(cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
value outside of the range [0, size]. The generic version always returns
exactly size. The generic version also uses unsigned long everywhere,
while the x86 versions use a mishmash of int, unsigned (int), long and
unsigned long.

Using the generic version does give a slightly bigger kernel, though.

defconfig: text data bss dec hex filename
x86-specific: 4738555 481232 626688 5846475 5935cb vmlinux (32 bit)
generic: 4738621 481232 626688 5846541 59360d vmlinux (32 bit)
x86-specific: 5392395 846568 724424 6963387 6a40bb vmlinux (64 bit)
generic: 5392458 846568 724424 6963450 6a40fa vmlinux (64 bit)

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2a8a2719 26-Apr-2008 Ingo Molnar <mingo@elte.hu>

x86 PAT: decouple from nonpromisc devmem

Linus pointed it out that PAT should not depend on NONPROMISC_DEVMEM.

Also make PAT non-default.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 765c68bd 09-Apr-2008 Ingo Molnar <mingo@elte.hu>

generic: make optimized inlining arch-opt-in

Stephen Rothwell reported that linux-next did not build on powerpc64.

make optimized inlining dependent on architecture opt-in.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 8db979bc 26-Apr-2008 Ingo Molnar <mingo@elte.hu>

x86 PAT: decouple from nonpromisc devmem

Linus pointed it out that PAT should not depend on NONPROMISC_DEVMEM.

Also make PAT non-default.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fcbc04c0 21-Apr-2008 Ingo Molnar <mingo@elte.hu>

x86: voyager fix

Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9f0e8d04 04-Apr-2008 Mike Travis <travis@sgi.com>

x86: convert cpumask_of_cpu macro to allocated array

* Here is a simple patch to use an allocated array of cpumasks to
represent cpumask_of_cpu() instead of constructing one on the stack.
It's based on the Kconfig option "HAVE_CPUMASK_OF_CPU_MAP" which is
currently only set for x86_64 SMP. Otherwise the the existing
cpumask_of_cpu() is used but has been changed to produce an lvalue
so a pointer to it can be used.

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 6ec6e0d9 25-Mar-2008 Suresh Siddha <suresh.b.siddha@intel.com>

srat, x86: add support for nodes spanning other nodes

For example, If the physical address layout on a two node system with 8 GB
memory is something like:
node 0: 0-2GB, 4-6GB
node 1: 2-4GB, 6-8GB

Current kernels fail to boot/detect this NUMA topology.

ACPI SRAT tables can expose such a topology which needs to be supported.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 82da3ff8 17-Apr-2008 Ingo Molnar <mingo@elte.hu>

x86: kgdb support

simplified and streamlined kgdb support on x86, both 32-bit and 64-bit,
based on patch from:

Subject: kgdb: core-lite
From: Jason Wessel <jason.wessel@windriver.com>

[ and countless other authors - see the patch for details. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>


# e44b7b75 10-Apr-2008 Pavel Machek <pavel@suse.cz>

x86: move suspend wakeup code to C

Move wakeup code to .c, so that video mode setting code can be shared
between boot and wakeup. Remove nasty assembly code in 64-bit case by
re-using trampoline code. Stack setup was fixed to clear high 16bits
of %esp, maybe that fixes some machines.

.c code sharing and morse code was done H. Peter Anvin, Sam Ravnborg
reviewed kbuild related stuff, and it seems okay to him. Rafael did
some cleanups.

[rjw:
* Made the patch stop breaking compilation on x86-32
* Added arch/x86/kernel/acpi/sleep.h
* Got rid of compiler warnings in arch/x86/kernel/acpi/sleep.c
* Fixed 32-bit compilation on x86-64 systems
* Added include/asm-x86/trampoline.h and fixed the non-SMP
compilation on 64-bit x86
* Removed arch/x86/kernel/acpi/sleep_32.c which was not used
* Fixed some breakage caused by the integration of smpboot.c done
under us in the meantime]

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f408b43c 01-Apr-2008 Randy Dunlap <randy.dunlap@oracle.com>

x86: fix VisualWS and Voyager kexec build failures

without this patch:

VOYAGER:
kernel/built-in.o: In function `crash_kexec':
(.text+0x28588): undefined reference to `machine_crash_shutdown'

VISWS:
kernel/built-in.o: In function `crash_kexec':
/next-20080401/kernel/kexec.c:1074: undefined reference to `machine_crash_shutdown'
make[1]: *** [.tmp_vmlinux1] Error 1

because arch/x86/kernel/reboot.c isn't built since CONFIG_X86_BIOS_REBOOT=n,
so machine_crash_shutdown() isn't available.

This patch does seem a small bit odd since the KEXEC help text says that
kexec is independent of the system firmware.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c64df707 21-Mar-2008 Yinghai Lu <yhlu.kernel.send@gmail.com>

x86: memtest bootparam

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fa3f1f42 21-Mar-2008 Jack Steiner <steiner@sgi.com>

x86: allow NODES_SHIFT to be a config option on x86_64

Allow the maximum number of nodes in an x86_64 system to
be configurable. This patch does NOT change the default value
but allows the value to be a config option.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 042b78e4 24-Mar-2008 Venki Pallipadi <venkatesh.pallipadi@intel.com>

x86: PAT infrastructure patch, documentation updates

Fix double help section in PAT Kconfig. Thanks to Randy Dunlap for catching
this bug.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2e5d9c85 18-Mar-2008 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com>

x86: PAT infrastructure patch

Sets up pat_init() infrastructure.

PAT MSR has following setting.
PAT
|PCD
||PWT
|||
000 WB _PAGE_CACHE_WB
001 WC _PAGE_CACHE_WC
010 UC- _PAGE_CACHE_UC_MINUS
011 UC _PAGE_CACHE_UC

We are effectively changing WT from boot time setting to WC.
UC_MINUS is used to provide backward compatibility to existing /dev/mem
users(X).

reserve_memtype and free_memtype are new interfaces for maintaining alias-free
mapping. It is currently implemented in a simple way with a linked list and
not optimized. reserve and free tracks the effective memory type, as a result
of PAT and MTRR setting rather than what is actually requested in PAT.

pat_init piggy backs on mtrr_init as the rules for setting both pat and mtrr
are same.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 4fe29a85 19-Mar-2008 Glauber de Oliveira Costa <gcosta@redhat.com>

x86: use specialized routine for setup per-cpu area

We use the same routing as x86_64, moved now to setup.c.
Just with a few ifdefs inside.
Note that this routing uses prefill_possible_map().
It has the very nice side effect of allowing hotplugging of
cpus that are marked as present but disabled by acpi bios.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 823c248e 28-Feb-2008 Roman Zippel <zippel@linux-m68k.org>

x86: fix recursive dependencies

The proper dependency check uncovered a few dependency problems,
the subarchitecture used a mixture of selects and depends on SMP
and PCI dependency was messed up.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b089c12b 27-Feb-2008 Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>

x86: X86_HT always enable on X86_64 SMP

X86_HT is used for hyperthreading or multicore on 32-bit.
The X86_HT on 64-bit is different from 32-bit, it means hyperthreading only.
And X86_HT is not used on 64-bit except from cpu/initel_cacheinfo.c.

Unify X86_HT for hyperthreading or multicore.
Turn X86_HT on when X86_64 and SMP are enabled.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 96597fd2 11-Feb-2008 Glauber Costa <gcosta@redhat.com>

x86: introduce vsmp paravirt helpers

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalemp.com>
Acked-by: Shai Fultheim <shai@scalemp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 64ac24e7 07-Mar-2008 Matthew Wilcox <willy@infradead.org>

Generic semaphore implementation

Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility. Thanks to Peter Zijlstra for fixing the lockdep
warning. Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>


# 53471121 12-Mar-2008 Randy Dunlap <randy.dunlap@oracle.com>

documentation: Move power-related files to Documentation/power/

Move 00-INDEX entries to power/00-INDEX (and add entry for
pm_qos_interface.txt).

Update references to moved filenames.

Fix some trailing whitespace.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 985a34bd 09-Mar-2008 Thomas Gleixner <tglx@linutronix.de>

x86: remove quicklists

quicklists cause a serious memory leak on 32-bit x86,
as documented at:

http://bugzilla.kernel.org/show_bug.cgi?id=9991

the reason is that the quicklist pool is a special-purpose
cache that grows out of proportion. It is not accounted for
anywhere and users have no way to even realize that it's
the quicklists that are causing RAM usage spikes. It was
supposed to be a relatively small pool, but as demonstrated
by KOSAKI Motohiro, they can grow as large as:

Quicklists: 1194304 kB

given how much trouble this code has caused historically,
and given that Andrew objected to its introduction on x86
(years ago), the best option at this point is to remove them.

[ any performance benefits of caching constructed pgds should
be implemented in a more generic way (possibly within the page
allocator), while still allowing constructed pages to be
allocated by other workloads. ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9edddaa2 04-Mar-2008 Ananth N Mavinakayanahalli <ananth@in.ibm.com>

Kprobes: indicate kretprobe support in Kconfig

Add CONFIG_HAVE_KRETPROBES to the arch/<arch>/Kconfig file for relevant
architectures with kprobes support. This facilitates easy handling of
in-kernel modules (like samples/kprobes/kretprobe_example.c) that depend on
kretprobes being present in the kernel.

Thanks to Sam Ravnborg for helping make the patch more lean.

Per Mathieu's suggestion, added CONFIG_KRETPROBES and fixed up dependencies.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1a4e3f89 20-Feb-2008 Randy Dunlap <randy.dunlap@oracle.com>

x86: disable KVM for Voyager and friends

Most classic Pentiums don't have hardware virtualization extension,
and building kvm with Voyager, Visual Workstation, or NUMAQ
generates spurious failures.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>


# 2c020a99 22-Feb-2008 Linus Torvalds <torvalds@woody.linux-foundation.org>

Mark CC_STACKPROTECTOR as being BROKEN

It's always been broken, but recent fixes actually made it do something,
and now the brokenness shows up as the resulting kernel simply not
working at all.

So it used to be that you could enable this config option, and it just
didn't do anything. Now we'd better stop people from enabling it by
mistake, since it _does_ do something, but does it so badly as to be
unusable.

Code to actually make it work is pending, but incomplete and won't be
merged into 2.6.25 in any case.

Acked-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: James Morris <jmorris@namei.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7d8330a5 09-Feb-2008 Balbir Singh <balbir@linux.vnet.ibm.com>

KVM is not seen under X86 config with latest git (32 bit compile)

The KVM configuration is no longer visible in the latest git tree. It looks
like it is selected by HAVE_SETUP_PER_CPU_AREA. I've moved HAVE_KVM to
under CONFIG_X86.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ec7748b5 09-Feb-2008 Sam Ravnborg <sam@ravnborg.org>

ide: introduce HAVE_IDE

To allow flexible configuration of IDE introduce HAVE_IDE.
All archs except arm, um and s390 unconditionally select it.
For arm the actual configuration determine if IDE is supported.

This is a step towards introducing drivers/Kconfig for arm.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Russell King - ARM Linux <linux@arm.linux.org.uk>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>


# a6869cc4 08-Feb-2008 Venki Pallipadi <venkatesh.pallipadi@intel.com>

cpuidle: build fix for non-x86

The last posted version of this patch gave compile error
on IA64. So, here goes yet another rewrite of the patch.

Convert cpu_idle_wait() to cpuidle_kick_cpus() which is
SMP-only, and gives error on non supported CPU.

Changes from last patch sent by Kevin:
Moved the definition of kick_cpus back to cpuidle.c from cpuidle.h:
* Having it in .h gives #error on archs which includes the header file without
actually having CPU_IDLE configured. To make it work in .h, we need one more
#ifdef around that code which makes it messy.
* Also, the function is only called from one file. So, it can be in declared
statically in .c rather than making it available to everyone who includes
the .h file.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Kevin Hilman <khilman@mvista.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# b0b933c0 08-Feb-2008 David Howells <dhowells@redhat.com>

aout: mark arches that support A.OUT format

Mark arches that support A.OUT format by including the following in their
master Kconfig files:

config ARCH_SUPPORTS_AOUT
def_bool y

This should also be set if the arch provides compatibility A.OUT support for
an older arch, for instance x86_64 for i386 or sparc64 for sparc.

I've guessed at which arches don't, based on comments in the code, however I'm
sure that some of the ones I've marked as 'yes' actually should be 'no'.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1f84260c 08-Jan-2008 Christoph Lameter <clameter@sgi.com>

SLUB: Alternate fast paths using cmpxchg_local

Provide an alternate implementation of the SLUB fast paths for alloc
and free using cmpxchg_local. The cmpxchg_local fast path is selected
for arches that have CONFIG_FAST_CMPXCHG_LOCAL set. An arch should only
set CONFIG_FAST_CMPXCHG_LOCAL if the cmpxchg_local is faster than an
interrupt enable/disable sequence. This is known to be true for both
x86 platforms so set FAST_CMPXCHG_LOCAL for both arches.

Currently another requirement for the fastpath is that the kernel is
compiled without preemption. The restriction will go away with the
introduction of a new per cpu allocator and new per cpu operations.

The advantages of a cmpxchg_local based fast path are:

1. Potentially lower cycle count (30%-60% faster)

2. There is no need to disable and enable interrupts on the fast path.
Currently interrupts have to be disabled and enabled on every
slab operation. This is likely avoiding a significant percentage
of interrupt off / on sequences in the kernel.

3. The disposal of freed slabs can occur with interrupts enabled.

The alternate path is realized using #ifdef's. Several attempts to do the
same with macros and inline functions resulted in a mess (in particular due
to the strange way that local_interrupt_save() handles its argument and due
to the need to define macros/functions that sometimes disable interrupts
and sometimes do something else).

[clameter: Stripped preempt bits and disabled fastpath if preempt is enabled]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 300ec130 07-Feb-2008 Bradley Smith <bradjsmith@btinternet.com>

I8K: add i8k driver to the x86_64 Kconfig

Adds i8k driver to the x86_64 Kconfig.

Signed-off-by: Bradley Smith <bradjsmith@btinternet.com>
Cc: Frank Sorenson <frank@tuxrocks.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9b713154 07-Feb-2008 Len Brown <len.brown@intel.com>

Revert "cpuidle: build fix for non-x86"

This reverts commit f757397097d0713c949af76dccabb65a2785782e.
which ironically broke the ia64 build


# 9a0b8415 31-Jan-2008 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com>

cpuidle: Add a poll_idle method

Add a default poll idle state with 0 latency. Provides an option to users
to use poll_idle by using 0 as the latency requirement.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 9d8af78b 06-Feb-2008 Bernhard Walle <bwalle@suse.de>

rtc: add HPET RTC emulation to RTC_DRV_CMOS

That patch adds the RTC emulation of the HPET timer to the new RTC_DRV_CMOS.
The old drivers/char/rtc.ko driver had that functionality and it's important
on new systems.

[akpm@linux-foundation.org: unbreak alpha build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Robert Picco <Robert.Picco@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# fde9a109 04-Feb-2008 FUJITA Tomonori <tomof@acm.org>

iommu sg: x86: convert gart IOMMU to use the IOMMU helper

This patch converts gart IOMMU to use the IOMMU helper functions. The
IOMMU doesn't allocate a memory area spanning LLD's segment boundary
anymore.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Muli Ben-Yehuda <mulix@mulix.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1b39b077 04-Feb-2008 FUJITA Tomonori <tomof@acm.org>

iommu sg: x86: convert calgary IOMMU to use the IOMMU helper

This patch converts calgary IOMMU to use the IOMMU helper
functions. The IOMMU doesn't allocate a memory area spanning LLD's
segment boundary anymore.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Muli Ben-Yehuda <mulix@mulix.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4cf31841 04-Feb-2008 Florian Fainelli <florian.fainelli@telecomint.eu>

x86: mach-rdc321x Kconfig fix

The mach-rdc321x uses the leds-gpio driver and explicitely
selects it, this driver also depends on the leds class module,
select it as well.

Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 03502faa 03-Feb-2008 Adrian Bunk <bunk@kernel.org>

remove Documentation/smp.txt

After seeing the filename I'd have expected something about the
implementation of SMP in the Linux kernel - not some notes on kernel
configuration and building trivialities noone would search at this
place.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Alan Cox <alan@redhat.com>


# 125e5645 02-Feb-2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

Move Kconfig.instrumentation to arch/Kconfig and init/Kconfig

Move the instrumentation Kconfig to

arch/Kconfig for architecture dependent options
- oprofile
- kprobes

and

init/Kconfig for architecture independent options
- profiling
- markers

Remove the "Instrumentation Support" menu. Everything moves to "General setup".
Delete the kernel/Kconfig.instrumentation file.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 3f550096 02-Feb-2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

Add HAVE_KPROBES

Linus:

On the per-architecture side, I do think it would be better to *not* have
internal architecture knowledge in a generic file, and as such a line like

depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32

really shouldn't exist in a file like kernel/Kconfig.instrumentation.

It would be much better to do

depends on ARCH_SUPPORTS_KPROBES

in that generic file, and then architectures that do support it would just
have a

bool ARCH_SUPPORTS_KPROBES
default y

in *their* architecture files. That would seem to be much more logical,
and is readable both for arch maintainers *and* for people who have no
clue - and don't care - about which architecture is supposed to support
which interface...

Changelog:

Actually, I know I gave this as the magic incantation, but now that I see
it, I realize that I should have told you to just use

config KPROBES_SUPPORT
def_bool y

instead, which is a bit denser.

We seem to use both kinds of syntax for these things, but this is really
what "def_bool" is there for...

- Use HAVE_KPROBES
- Use a select

- Yet another update :
Moving to HAVE_* now.

- Update ARM for kprobes support.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 42d4b839 02-Feb-2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

Add HAVE_OPROFILE

Linus:
On the per-architecture side, I do think it would be better to *not* have
internal architecture knowledge in a generic file, and as such a line like

depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32

really shouldn't exist in a file like kernel/Kconfig.instrumentation.

It would be much better to do

depends on ARCH_SUPPORTS_KPROBES

in that generic file, and then architectures that do support it would just
have a

bool ARCH_SUPPORTS_KPROBES
default y

in *their* architecture files. That would seem to be much more logical,
and is readable both for arch maintainers *and* for people who have no
clue - and don't care - about which architecture is supposed to support
which interface...

Changelog:

Actually, I know I gave this as the magic incantation, but now that I see
it, I realize that I should have told you to just use

config ARCH_SUPPORTS_KPROBES
def_bool y

instead, which is a bit denser.

We seem to use both kinds of syntax for these things, but this is really
what "def_bool" is there for...

Changelog :

- Moving to HAVE_*.
- Add AVR32 oprofile.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# f4cb5700 07-Dec-2007 Johannes Berg <johannes@sipsolutions.net>

Suspend: Clean up Kconfig (V2)

This cleans up the suspend Kconfig and removes the need to
declare centrally which architectures support suspend. All
architectures that currently support suspend are modified
accordingly.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Pavel Machek <pavel@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>


# 801e4062 07-Dec-2007 Johannes Berg <johannes@sipsolutions.net>

Hibernation: Clean up Kconfig (V2)

This cleans up the hibernation Kconfig and removes the need to
declare centrally which architectures support hibernation. All
architectures that currently support hibernation are modified
accordingly.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Pavel Machek <pavel@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>


# 8f0e7d24 13-Dec-2007 Adrian Bunk <bunk@kernel.org>

PCI: Kconfig help: don't refer to the PCI-HOWTO

A HOWTO that hasn't been updated for half a dozen years no longer
"contains valuable information about which PCI hardware does work under
Linux and which doesn't".

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# aa7d9350 01-Feb-2008 Heiko Carstens <hca@linux.ibm.com>

latencytop: Change Kconfig dependency.

Change latencytop Kconfig entry so it doesn't list the archictectures
that support it. Instead introduce HAVE_LATENCY_SUPPORT which any
architecture can set. Should reduce patch conflicts.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Holger Wolf <wolf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f7573970 31-Jan-2008 Kevin Hilman <khilman@mvista.com>

cpuidle: build fix for non-x86

Convert cpu_idle_wait() to cpuidle_kick_cpus() macro which is
SMP-only, and gives error on non supported CPU.

Signed-off-by: Kevin Hilman <khilman@mvista.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>


# edf88417 16-Dec-2007 Avi Kivity <avi@qumranet.com>

KVM: Move arch dependent files to new directory arch/x86/kvm/

This paves the way for multiple architecture support. Note that while
ioapic.c could potentially be shared with ia64, it is also moved.

Signed-off-by: Avi Kivity <avi@qumranet.com>


# fb56dbb3 02-Dec-2007 Avi Kivity <avi@qumranet.com>

KVM: Export include/linux/kvm.h only if $ARCH actually supports KVM

Currently, make headers_check barfs due to <asm/kvm.h>, which <linux/kvm.h>
includes, not existing. Rather than add a zillion <asm/kvm.h>s, export kvm.h
only if the arch actually supports it.

Signed-off-by: Avi Kivity <avi@qumranet.com>


# 4716e79c 30-Jan-2008 Huang, Ying <ying.huang@intel.com>

x86: replace boot_ioremap() with enhanced bt_ioremap() - remove boot_ioremap()

This patch replaces boot_ioremap invokation with bt_ioremap and
removes the boot_ioremap implementation.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 5e3a77e9 30-Jan-2008 Florian Fainelli <florian.fainelli@telecomint.eu>

x86: add support for the RDC R-321x SoC

This patch adds support for the RDC R-321x system-on-chip,
also known as R-861x-(G). It uses the generic GPIO API and
has support for the on-chip hardware watchdog.

Build-fix from: Randy Dunlap <randy.dunlap@oracle.com>

Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# a6082959 30-Jan-2008 Florian Fainelli <florian.fainelli@telecomint.eu>

x86: add generic GPIO support to x86

This patch adds the generic GPIO support to the x86
architecture. We do the same as for MIPS, we let
the machine override the gpio callbacks and provide
defaults one in mach-generic.

Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# dd5af90a 30-Jan-2008 Mike Travis <travis@sgi.com>

x86/non-x86: percpu, node ids, apic ids x86.git fixup

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 42d545c9 30-Jan-2008 Eduardo Pereira Habkost <ehabkost@redhat.com>

x86: remove depends on X86_32 from PARAVIRT & PARAVIRT_GUEST

With this, the paravirt_ops code can be enabled on x86_64 also.

Each guest implementation (Xen, VMI, lguest) now depends on X86_32. The
dependencies can be dropped for each one when they start to support
x86_64.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# e61bd94a 30-Jan-2008 Eduardo Pereira Habkost <ehabkost@redhat.com>

x86: allow enabling PARAVIRT without any guest implementation

This will allow people to enable the paravirt_ops code even when no
guest support is enabled, for broader testing of the paravirt_ops code.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# f8f76481 30-Jan-2008 Bernhard Walle <bwalle@suse.de>

rtc: use the IRQ callback interface in (old) RTC driver

the previous patch in the old RTC driver. It also removes the direct
rtc_interrupt() call from arch/x86/kernel/hpetc.c so that there's finally no
(code) dependency to CONFIG_RTC in arch/x86/kernel/hpet.c.

Because of this, it's possible to compile the drivers/char/rtc.ko driver as
module and still use the HPET emulation functionality. This is also expressed
in Kconfig.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Robert Picco <Robert.Picco@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 43238382 30-Jan-2008 travis@sgi.com <travis@sgi.com>

x86: change size of node ids from u8 to s16

Change the size of node ids for X86_64 from u8 to s16 to
accomodate more than 32k nodes and allow for NUMA_NO_NODE
(-1) to be sign extended to int.

Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 409a7b85 30-Jan-2008 Mel Gorman <mel@csn.ul.ie>

x86: relax restrictions on setting CONFIG_NUMA on x86, #2

The FLATMEM memory model references a global mem_map and max_mapnr. This
is incompatible with how memory models used for NUMA view the world.
Builds fail if FLATMEM && NUMA are set on x86. This patch forbids that
combination of config items. This is consistent with x86_64
enforcements.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# b32ef636 30-Jan-2008 travis@sgi.com <travis@sgi.com>

percpu: use a kconfig variable to signal arch specific percpu setup

The use of the __GENERIC_PERCPU is a bit problematic since arches
may want to run their own percpu setup while using the generic
percpu definitions. Replace it through a kconfig variable.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a6b68076 30-Jan-2008 Andi Kleen <ak@linux.intel.com>

x86: compile apm and voyager module only when selected in Kconfig

Previously the complete files were #ifdef'ed, but now handle that in the
Makefile.

May save a minor bit of compilation time.

[ Stephen Rothwell <sfr@canb.auug.org.au>: build dependency fix ]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 1c858087 30-Jan-2008 Adrian Bunk <bunk@kernel.org>

x86: default to PCI=y

PCI is one of the few hardware stuff where defaulting to y makes sense.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 6b0c3d44 30-Jan-2008 Sam Ravnborg <sam@ravnborg.org>

x86: unify arch/x86/kernel/Makefile(s)

Combine the 32 and 64 bit specific Makefiles in one file.
While doing so link order was (almost) preserved on 32 bit
but on 64 bit link order changed a lot.

Patch was checked with defconfig + allyesconfig builds.
The same .o files were linked in these configurations.

To keep readability of the Makefiles a few Kconfig
symbols was added/modified and it was checked that
they were not used anywhere else.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 8b2cb7a8 30-Jan-2008 Huang, Ying <ying.huang@intel.com>

x86: 32-bit EFI runtime service support: fixes in sync with 64-bit support

support according to fixes of x86_64 support.

- Delete efi_rt_lock because it is used during system early boot,
before SMP is initialized.

- Change local_flush_tlb() to __flush_tlb_all() to flush global page
mapping.

- Clean up includes.

- Revise Kconfig description.

- Enable noefi kernel parameter on i386.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# a97f52e6 30-Jan-2008 Roland McGrath <roland@redhat.com>

x86: compat_binfmt_elf

This switches x86-64's 32-bit ELF support to use the shared
fs/compat_binfmt_elf.c code instead of our own ia32_binfmt.c.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 314cdbef 30-Jan-2008 Nick Piggin <npiggin@suse.de>

x86: FIFO ticket spinlocks

Introduce ticket lock spinlocks for x86 which are FIFO. The implementation
is described in the comments. The straight-line lock/unlock instruction
sequence is slightly slower than the dec based locks on modern x86 CPUs,
however the difference is quite small on Core2 and Opteron when working out of
cache, and becomes almost insignificant even on P4 when the lock misses cache.
trylock is more significantly slower, but they are relatively rare.

On an 8 core (2 socket) Opteron, spinlock unfairness is extremely noticable,
with a userspace test having a difference of up to 2x runtime per thread, and
some threads are starved or "unfairly" granted the lock up to 1 000 000 (!)
times. After this patch, all threads appear to finish at exactly the same
time.

The memory ordering of the lock does conform to x86 standards, and the
implementation has been reviewed by Intel and AMD engineers.

The algorithm also tells us how many CPUs are contending the lock, so
lockbreak becomes trivial and we no longer have to waste 4 bytes per
spinlock for it.

After this, we can no longer spin on any locks with preempt enabled
and cannot reenable interrupts when spinning on an irq safe lock, because
at that point we have already taken a ticket and the would deadlock if
the same CPU tries to take the lock again. These are questionable anyway:
if the lock happens to be called under a preempt or interrupt disabled section,
then it will just have the same latency problems. The real fix is to keep
critical sections short, and ensure locks are reasonably fair (which this
patch does).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 95c354fe 30-Jan-2008 Nick Piggin <npiggin@suse.de>

spinlock: lockbreak cleanup

The break_lock data structure and code for spinlocks is quite nasty.
Not only does it double the size of a spinlock but it changes locking to
a potentially less optimal trylock.

Put all of that under CONFIG_GENERIC_LOCKBREAK, and introduce a
__raw_spin_is_contended that uses the lock data itself to determine whether
there are waiters on the lock, to be used if CONFIG_GENERIC_LOCKBREAK is
not set.

Rename need_lockbreak to spin_needbreak, make it use spin_is_contended to
decouple it from the spinlock implementation, and make it typesafe (rwlocks
do not have any need_lockbreak sites -- why do they even get bloated up
with that break_lock then?).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 5b83683f 30-Jan-2008 Huang, Ying <ying.huang@intel.com>

x86: EFI runtime service support

This patch adds basic runtime services support for EFI x86_64 system. The
main file of the patch is the addition of efi_64.c for x86_64. This file is
modeled after the EFI IA32 avatar. EFI runtime services initialization are
implemented in efi_64.c. Some x86_64 specifics are worth noting here. On
x86_64, parameters passed to EFI firmware services need to follow the EFI
calling convention. For this purpose, a set of functions named efi_call<x>
(<x> is the number of parameters) are implemented. EFI function calls are
wrapped before calling the firmware service. The duplicated code between
efi_32.c and efi_64.c is placed in efi.c to remove them from efi_32.c.

Signed-off-by: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 3c2362e6 30-Jan-2008 Harvey Harrison <harvey.harrison@gmail.com>

x86: use def_bool where possible

Change occurances of:
bool
default X

to:
def_bool X

Change ocurances of:
bool "Foo"
default X

to:
def_bool X
prompt "Foo"

Shows no difference in generated config for allmodconfig/allyesconfig.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# b263295d 30-Jan-2008 Christoph Lameter <clameter@sgi.com>

x86: 64-bit, make sparsemem vmemmap the only memory model

Use sparsemem as the only memory model for UP, SMP and NUMA. Measurements
indicate that DISCONTIGMEM has a higher overhead than sparsemem. And
FLATMEMs benefits are minimal. So I think its best to simply standardize
on sparsemem.

Results of page allocator tests (test can be had via git from slab git
tree branch tests)

Measurements in cycle counts. 1000 allocations were performed and then the
average cycle count was calculated.

Order FlatMem Discontig SparseMem
0 639 665 641
1 567 647 593
2 679 774 692
3 763 967 781
4 961 1501 962
5 1356 2344 1392
6 2224 3982 2336
7 4869 7225 5074
8 12500 14048 12732
9 27926 28223 28165
10 58578 58714 58682

(Note that FlatMem is an SMP config and the rest NUMA configurations)

Memory use:

SMP Sparsemem
-------------

Kernel size:

text data bss dec hex filename
3849268 397739 1264856 5511863 541ab7 vmlinux

total used free shared buffers cached
Mem: 8242252 41164 8201088 0 352 11512
-/+ buffers/cache: 29300 8212952
Swap: 9775512 0 9775512

SMP Flatmem
-----------

Kernel size:

text data bss dec hex filename
3844612 397739 1264536 5506887 540747 vmlinux

So 4.5k growth in text size vs. FLATMEM.

total used free shared buffers cached
Mem: 8244052 40544 8203508 0 352 11484
-/+ buffers/cache: 28708 8215344

2k growth in overall memory use after boot.

NUMA discontig:

text data bss dec hex filename
3888124 470659 1276504 5635287 55fcd7 vmlinux

total used free shared buffers cached
Mem: 8256256 56908 8199348 0 352 11496
-/+ buffers/cache: 45060 8211196
Swap: 9775512 0 9775512

NUMA sparse:

text data bss dec hex filename
3896428 470659 1276824 5643911 561e87 vmlinux

8k text growth. Given that we fully inline virt_to_page and friends now
that is rather good.

total used free shared buffers cached
Mem: 8264720 57240 8207480 0 352 11516
-/+ buffers/cache: 45372 8219348
Swap: 9775512 0 9775512

The total available memory is increased by 8k.

This patch makes sparsemem the default and removes discontig and
flatmem support from x86.

[ akpm@linux-foundation.org: allnoconfig build fix ]

Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# af65d648 30-Jan-2008 Roland McGrath <roland@redhat.com>

x86 vDSO: consolidate vdso32

This makes x86_64's ia32 emulation support share the sources used in the
32-bit kernel for the 32-bit vDSO and much of its setup code.

The 32-bit vDSO mapping now behaves the same on x86_64 as on native 32-bit.
The abi.syscall32 sysctl on x86_64 now takes the same values that
vm.vdso_enabled takes on the 32-bit kernel. That is, 1 means a randomized
vDSO location, 2 means the fixed old address. The CONFIG_COMPAT_VDSO
option is now available to make this the default setting, the same meaning
it has for the 32-bit kernel. (This does not affect the 64-bit vDSO.)

The argument vdso32=[012] can be used on both 32-bit and 64-bit kernels to
set this paramter at boot time. The vdso=[012] argument still does this
same thing on the 32-bit kernel.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 3743d33e 06-Dec-2007 Linus Torvalds <torvalds@woody.linux-foundation.org>

Tiny clean-up of OPROFILE/KPROBES configuration

Make the Kconfig.instrumentation file a bit easier on the eyes, and use
the new ARCH_SUPPORTS_OPROFILE for x86[-64].

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ee0011a7 04-Dec-2007 Adrian Bunk <bunk@kernel.org>

x86: revert CONFIG_X86_HT semantics change

The recent Kconfig changes in x86 resulted in CONFIG_X86_HT no longer
being set if (X86_32 && MK8).

After grep'ing through the tree I think the problem is that different
places have different assumptions about the semantics of CONFIG_X86_HT,
either:

- hyperthreading or
- multicore

This should be sorted out properly, but until then we should keep the
2.6.23 status quo.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 6840999b 17-Nov-2007 Sam Ravnborg <sam@ravnborg.org>

x86: simplify "make ARCH=x86" and fix kconfig all.config

Simplify "make ARCH=x86" and fix kconfig so we again can set 64BIT in
all.config.

For a fix the diffstat is nice:
6 files changed, 3 insertions(+), 36 deletions(-)

The patch reverts these commits:
- 0f855aa64b3f63d35a891510cf7db932a435c116 ("kconfig: add helper to set
config symbol from environment variable")
- 2a113281f5cd2febbab21a93c8943f8d3eece4d3 ("kconfig: use $K64BIT to
set 64BIT with all*config targets")

Roman Zippel pointed out that kconfig supported string compares so
the additional complexity introduced by the above two patches were
not needed.

With this patch we have following behaviour:

# make {allno,allyes,allmod,rand}config [ARCH=...]
option \ host arch | 32bit | 64bit
=====================================================
./. | 32bit | 64bit
ARCH=x86 | 32bit | 32bit
ARCH=i386 | 32bit | 32bit
ARCH=x86_64 | 64bit | 64bit

The general rule are that ARCH= and native architecture takes
precedence over the configuration.

So make ARCH=i386 [whatever] will always build a 32-bit kernel
no matter what the configuration says. The configuration will
be updated to 32-bit if it was configured to 64-bit and the
other way around.

This behaviour is consistent with previous behaviour so no
suprises here.

make ARCH=x86 will per default result in a 32-bit kernel but as
the only ARCH= value x86 allow the user to select between 32-bit
and 64-bit using menuconfig.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andreas Herrmann <aherrman@arcor.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 80ef88d6 17-Nov-2007 Sam Ravnborg <sam@ravnborg.org>

x86: simplify "make ARCH=x86" and fix kconfig all.config

Simplify "make ARCH=x86" and fix kconfig so we again
can set 64BIT in all.config.

For a fix the diffstat is nice:
6 files changed, 3 insertions(+), 36 deletions(-)

The patch reverts these commits:
0f855aa64b3f63d35a891510cf7db932a435c116
-> kconfig: add helper to set config symbol from environment variable

2a113281f5cd2febbab21a93c8943f8d3eece4d3
-> kconfig: use $K64BIT to set 64BIT with all*config targets

Roman Zippel pointed out that kconfig supported string
compares so the additional complexity introduced by the
above two patches were not needed.

With this patch we have following behaviour:

# make {allno,allyes,allmod,rand}config [ARCH=...]
option \ host arch | 32bit | 64bit
=====================================================
./. | 32bit | 64bit
ARCH=x86 | 32bit | 32bit
ARCH=i386 | 32bit | 32bit
ARCH=x86_64 | 64bit | 64bit

The general rule are that ARCH= and native architecture
takes precedence over the configuration.
So make ARCH=i386 [whatever] will always build a 32-bit
kernel no matter what the configuration says.
The configuration will be updated to 32-bit if it was
configured to 64-bit and the other way around.

This behaviour is consistent with previous behaviour so
no suprises here.

make ARCH=x86 will per default result in a 32-bit kernel
but as the only ARCH= value x86 allow the user to select
between 32-bit and 64-bit using menuconfig.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andreas Herrmann <aherrman@arcor.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>


# daa93fab 12-Nov-2007 Sam Ravnborg <sam@ravnborg.org>

x86: enable "make ARCH=x86"

After unification of the Kconfig files and
introducing K64BIT support in kconfig
it required only trivial changes to enable
"make ARCH=x86".

With this patch you can build for x86_64 in several ways:
1) make ARCH=x86_64
2) make ARCH=x86 K64BIT=y
3) make ARCH=x86 menuconfig
=> select 64-bit

Likewise for i386 with the addition that
i386 is default is you say ARCH=x86.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>


# 506f1d07 09-Nov-2007 Sam Ravnborg <sam@ravnborg.org>

x86: move the rest of the menu's to Kconfig

With this patch we have all the Kconfig file shared
between i386 and x86_64.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>


# 8d5fffb9 06-Nov-2007 Sam Ravnborg <sam@ravnborg.org>

x86: move all simple arch settings to Kconfig

Most of the arch settings were equal so combine them
in the first part of Kconfig.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>


# bc0120fd 06-Nov-2007 Sam Ravnborg <sam@ravnborg.org>

x86: copy x86_64 specific Kconfig symbols to Kconfig.i386

No functional changes.
A prepatory step towards full unification.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>


# 1032c0ba 06-Nov-2007 Sam Ravnborg <sam@ravnborg.org>

x86: arch/x86/Kconfig.cpu unification

Move all CPU definitions to Kconfig.cpu
Always define X86_MINIMUM_CPU_FAMILY and do the
obvious code cleanup in boot/cpucheck.c

Comments from: Adrian Bunk <bunk@kernel.org> incorporated.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Brian Gerst <bgerst@didntduck.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>


# e279b6c1 06-Nov-2007 Sam Ravnborg <sam@ravnborg.org>

x86: start unification of arch/x86/Kconfig.*

This step introduces the file arch/x86/Kconfig
which contains all the menu's from "Power Management"
and below.

The main part of the new Kconfig file is shared
and the remaining i386/x86_64 specific symbols
are covered by dependencies.

A x86_64 allmodconfig build did not show any differences.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>