History log of /linux-master/arch/x86/include/asm/special_insns.h
Revision Date Author Comments
# 154fcf3a 04-Mar-2024 Thomas Gleixner <tglx@linutronix.de>

x86/msr: Prepare for including <linux/percpu.h> into <asm/msr.h>

To clean up the per CPU insanity of UP which causes sparse to be rightfully
unhappy and prevents the usage of the generic per CPU accessors on cpu_info
it is necessary to include <linux/percpu.h> into <asm/msr.h>.

Including <linux/percpu.h> into <asm/msr.h> is impossible because it ends
up in header dependency hell. The problem is that <asm/processor.h>
includes <asm/msr.h>. The inclusion of <linux/percpu.h> results in a
compile fail where the compiler cannot longer handle an include in
<asm/cpufeature.h> which references boot_cpu_data which is
defined in <asm/processor.h>.

The only reason why <asm/msr.h> is included in <asm/processor.h> are the
set/get_debugctlmsr() inlines. They are defined there because <asm/processor.h>
is such a nice dump ground for everything. In fact they belong obviously
into <asm/debugreg.h>.

Move them to <asm/debugreg.h> and fix up the resulting damage which is just
exposing the reliance on random include chains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240304005104.454678686@linutronix.de


# 5bdd1818 25-Jan-2024 Kai Huang <kai.huang@intel.com>

x86/asm: Remove the __iomem annotation of movdir64b()'s dst argument

Commit e56d28df2f66 ("x86/virt/tdx: Configure global KeyID on all
packages") causes a sparse warning:

arch/x86/virt/vmx/tdx/tdx.c:683:27: warning: incorrect type in argument 1 (different address spaces)
arch/x86/virt/vmx/tdx/tdx.c:683:27: expected void [noderef] __iomem *dst
arch/x86/virt/vmx/tdx/tdx.c:683:27: got void *

The reason is TDX must use the MOVDIR64B instruction to convert TDX
private memory (which is normal RAM but not MMIO) back to normal. The
TDX code uses existing movdir64b() helper to do that, but the first
argument @dst of movdir64b() is annotated with __iomem.

When movdir64b() was firstly introduced in commit 0888e1030d3e
("x86/asm: Carve out a generic movdir64b() helper for general usage"),
it didn't have the __iomem annotation. But this commit also introduced
the same "incorrect type" sparse warning because the iosubmit_cmds512(),
which was the solo caller of movdir64b(), has the __iomem annotation.

This was later fixed by commit 6ae58d871319 ("x86/asm: Annotate
movdir64b()'s dst argument with __iomem"). That fix was reasonable
because until TDX code the movdir64b() was only used to move data to
MMIO location, as described by the commit message:

... The current usages send a 64-bytes command descriptor to an MMIO
location (portal) on a device for consumption. When future usages for
the MOVDIR64B instruction warrant a separate variant of a memory to
memory operation, the argument annotation can be revisited.

Now TDX code uses MOVDIR64B to move data to normal memory so it's time
to revisit.

The SDM says the destination of MOVDIR64B is "memory location specified
in a general register", thus it's more reasonable that movdir64b() does
not have the __iomem annotation on the @dst.

Remove the __iomem annotation from the @dst argument of movdir64b() to
fix the sparse warning in TDX code. Similar to memset_io(), introduce a
new movdir64b_io() to cover the case where the destination is an MMIO
location, and change the solo caller iosubmit_cmds512() to use the new
movdir64b_io().

In movdir64b_io() explicitly use __force in the type casting otherwise
there will be below sparse warning:

warning: cast removes address space '__iomem' of expression

[ dhansen: normal changelog tweaks ]

Closes: https://lore.kernel.org/oe-kbuild-all/202312311924.tGjsBIQD-lkp@intel.com/
Fixes: e56d28df2f66 ("x86/virt/tdx: Configure global KeyID on all packages")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/all/20240126023852.11065-1-kai.huang%40intel.com


# 4356e9f8 09-Feb-2024 Linus Torvalds <torvalds@linux-foundation.org>

work around gcc bugs with 'asm goto' with outputs

We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c3238a ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f180345f53 ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").

Then, much later, we ended up removing the workaround in commit
43c249ea0b1e ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.

Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround. But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.

It looks like there are at least two separate issues that all hit in
this area:

(a) some versions of gcc don't mark the asm goto as 'volatile' when it
has outputs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420

which is easy to work around by just adding the 'volatile' by hand.

(b) Internal compiler errors:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

which are worked around by adding the extra empty 'asm' as a
barrier, as in the original workaround.

but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.

but the same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.

Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 92805476 12-Jun-2023 Rick Edgecombe <rick.p.edgecombe@intel.com>

x86/shstk: Introduce routines modifying shstk

Shadow stacks are normally written to via CALL/RET or specific CET
instructions like RSTORSSP/SAVEPREVSSP. However, sometimes the kernel will
need to write to the shadow stack directly using the ring-0 only WRUSS
instruction.

A shadow stack restore token marks a restore point of the shadow stack, and
the address in a token must point directly above the token, which is within
the same shadow stack. This is distinctively different from other pointers
on the shadow stack, since those pointers point to executable code area.

Introduce token setup and verify routines. Also introduce WRUSS, which is
a kernel-mode instruction but writes directly to user shadow stack.

In future patches that enable shadow stack to work with signals, the kernel
will need something to denote the point in the stack where sigreturn may be
called. This will prevent attackers calling sigreturn at arbitrary places
in the stack, in order to help prevent SROP attacks.

To do this, something that can only be written by the kernel needs to be
placed on the shadow stack. This can be accomplished by setting bit 63 in
the frame written to the shadow stack. Userspace return addresses can't
have this bit set as it is in the kernel range. It also can't be a valid
restore token.

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230613001108.3040476-31-rick.p.edgecombe%40intel.com


# 10a09940 12-Jan-2023 Peter Zijlstra <peterz@infradead.org>

cpuidle, xenpv: Make more PARAVIRT_XXL noinstr clean

objtool found a few cases where this code called out into instrumented
code:

vmlinux.o: warning: objtool: acpi_idle_enter_s2idle+0xde: call to wbinvd() leaves .noinstr.text section
vmlinux.o: warning: objtool: default_idle+0x4: call to arch_safe_halt() leaves .noinstr.text section
vmlinux.o: warning: objtool: xen_safe_halt+0xa: call to HYPERVISOR_sched_op.constprop.0() leaves .noinstr.text section

Solve this by:

- marking arch_safe_halt(), wbinvd(), native_wbinvd() and
HYPERVISOR_sched_op() as __always_inline().

- Explicitly uninlining xen_safe_halt() and pv_native_wbinvd() [they were
already uninlined by the compiler on use as function pointers] and
annotating them as 'noinstr'.

- Annotating pv_native_safe_halt() as 'noinstr'.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195541.171918174@infradead.org


# 2ec8efe6 12-Jan-2023 Peter Zijlstra <peterz@infradead.org>

cpuidle, mwait: Make the mwait code noinstr clean

objtool found a few cases where this code called out into instrumented
code:

vmlinux.o: warning: objtool: intel_idle_s2idle+0x6e: call to __monitor.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0x8c: call to __monitor.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle+0x73: call to __monitor.constprop.0() leaves .noinstr.text section

vmlinux.o: warning: objtool: mwait_idle+0x88: call to clflush() leaves .noinstr.text section

Fix it by marking the affected methods as __always_inline.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195541.050542952@infradead.org


# 821ad23d 12-Jan-2023 Peter Zijlstra <peterz@infradead.org>

cpuidle, intel_idle: Fix CPUIDLE_FLAG_INIT_XSTATE

Fix instrumentation bugs objtool found:

vmlinux.o: warning: objtool: intel_idle_s2idle+0xd5: call to fpu_idle_fpregs() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_xstate+0x11: call to fpu_idle_fpregs() leaves .noinstr.text section
vmlinux.o: warning: objtool: fpu_idle_fpregs+0x9: call to xfeatures_in_use() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.494977795@infradead.org


# ae53fa18 12-Jan-2023 H. Peter Anvin (Intel) <hpa@zytor.com>

x86/gsseg: Move load_gs_index() to its own new header file

GS is a special segment on x86_64, move load_gs_index() to its own new
header file to simplify header inclusion.

No change in functionality.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230112072032.35626-5-xin3.li@intel.com


# df729fb0 12-Jan-2023 H. Peter Anvin (Intel) <hpa@zytor.com>

x86/gsseg: Make asm_load_gs_index() take an u16

Let GCC know that only the low 16 bits of load_gs_index() argument
actually matter. It might allow it to create slightly better
code. However, do not propagate this into the prototypes of functions
that end up being paravirtualized, to avoid unnecessary changes.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230112072032.35626-4-xin3.li@intel.com


# f17b1687 08-Jun-2022 Chang S. Bae <chang.seok.bae@intel.com>

x86/fpu: Add a helper to prepare AMX state for low-power CPU idle

When a CPU enters an idle state, a non-initialized AMX register state may
be the cause of preventing a deeper low-power state. Other extended
register states whether initialized or not do not impact the CPU idle
state.

The new helper can ensure the AMX state is initialized before the CPU is
idle, and it will be used by the intel idle driver.

Check the AMX_TILE feature bit before using XGETBV1 as a chain of
dependencies was established via cpuid_deps[]: AMX->XFD->XGETBV1.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20220608164748.11864-2-chang.seok.bae@intel.com


# 203d8919 25-Mar-2022 Brian Gerst <brgerst@gmail.com>

x86/asm: Merge load_gs_index()

Merge the 32- and 64-bit implementations of load_gs_index().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20220325153953.162643-5-brgerst@gmail.com


# d81ff5fe 10-Sep-2021 Kees Cook <keescook@chromium.org>

x86/asm: Fix SETZ size enqcmds() build failure

When building under GCC 4.9 and 5.5:

arch/x86/include/asm/special_insns.h: Assembler messages:
arch/x86/include/asm/special_insns.h:286: Error: operand size mismatch for `setz'

Change the type to "bool" for condition code arguments, as documented.

Fixes: 7f5933f81bd8 ("x86/asm: Add an enqcmds() wrapper for the ENQCMDS instruction")
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210910223332.3224851-1-keescook@chromium.org


# 72a6c08c 23-Jun-2021 Thomas Gleixner <tglx@linutronix.de>

x86/pkru: Remove xstate fiddling from write_pkru()

The PKRU value of a task is stored in task->thread.pkru when the task is
scheduled out. PKRU is restored on schedule in from there. So keeping the
XSAVE buffer up to date is a pointless exercise.

Remove the xstate fiddling and cleanup all related functions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121456.897372712@linutronix.de


# a89dfde3 11-Mar-2021 Peter Zijlstra <peterz@infradead.org>

x86: Remove dynamic NOP selection

This ensures that a NOP is a NOP and not a random other instruction that
is also a NOP. It allows simplification of dynamic code patching that
wants to verify existing code before writing new instructions (ftrace,
jump_label, static_call, etc..).

Differentiating on NOPs is not a feature.

This pessimises 32bit (DONTCARE) and 32bit on 64bit CPUs (CARELESS).
32bit is not a performance target.

Everything x86_64 since AMD K10 (2007) and Intel IvyBridge (2012) is
fine with using NOPL (as opposed to prefix NOP). And per FEATURE_NOPL
being required for x86_64, all x86_64 CPUs can use NOPL. So stop
caring about NOPs, simplify things and get on with life.

[ The problem seems to be that some uarchs can only decode NOPL on a
single front-end port while others have severe decode penalties for
excessive prefixes. All modern uarchs can handle both, except Atom,
which has prefix penalties. ]

[ Also, much doubt you can actually measure any of this on normal
workloads. ]

After this, FEATURE_NOPL is unused except for required-features for
x86_64. FEATURE_K8 is only used for PTI.

[ bp: Kernel build measurements showed ~0.3s slowdown on Sandybridge
which is hardly a slowdown. Get rid of X86_FEATURE_K7, while at it. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> # bpf
Acked-by: Linus Torvalds <torvalds@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20210312115749.065275711@infradead.org


# 5c99720b 07-Jan-2021 Dave Jiang <dave.jiang@intel.com>

x86/asm: Add a missing __iomem annotation in enqcmds()

Add a missing __iomem annotation to address a sparse warning. The caller
is expected to pass an __iomem annotated pointer to this function. The
current usages send a 64-bytes command descriptor to an MMIO location
(portal) on a device for consumption.

Also, from the comment in movdir64b(), which also applies to enqcmds(),
@__dst must be supplied as an lvalue because this tells the compiler
what the object is (its size) the instruction accesses. I.e., not the
pointers but what they point to, thus the deref'ing '*'."

The actual sparse warning is:

drivers/dma/idxd/submit.c: note: in included file (through arch/x86/include/asm/processor.h, \
arch/x86/include/asm/timex.h, include/linux/timex.h, include/linux/time32.h, \
include/linux/time.h, include/linux/stat.h, ...):
./arch/x86/include/asm/special_insns.h:289:41: warning: incorrect type in initializer (different address spaces)
./arch/x86/include/asm/special_insns.h:289:41: expected struct <noident> *__dst
./arch/x86/include/asm/special_insns.h:289:41: got void [noderef] __iomem *dst

[ bp: Massage commit message. ]

Fixes: 7f5933f81bd8 ("x86/asm: Add an enqcmds() wrapper for the ENQCMDS instruction")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lkml.kernel.org/r/161003789741.4062451.14362269365703761223.stgit@djiang5-desk3.ch.intel.com


# 6ae58d87 07-Jan-2021 Dave Jiang <dave.jiang@intel.com>

x86/asm: Annotate movdir64b()'s dst argument with __iomem

Add a missing __iomem annotation to address a sparse warning. The caller
is expected to pass an __iomem annotated pointer to this function. The
current usages send a 64-bytes command descriptor to an MMIO location
(portal) on a device for consumption. When future usages for the
MOVDIR64B instruction warrant a separate variant of a memory to memory
operation, the argument annotation can be revisited.

Also, from the comment in movdir64b() @__dst must be supplied as an
lvalue because this tells the compiler what the object is (its size) the
instruction accesses. I.e., not the pointers but what they point to,
thus the deref'ing '*'."

The actual sparse warning is:

sparse warnings: (new ones prefixed by >>)
drivers/dma/idxd/submit.c: note: in included file (through include/linux/io.h, include/linux/pci.h):
>> arch/x86/include/asm/io.h:422:27: sparse: sparse: incorrect type in \
argument 1 (different address spaces)
@@ expected void *dst
@@ got void [noderef] __iomem *dst @@
arch/x86/include/asm/io.h:422:27: sparse: expected void *dst
arch/x86/include/asm/io.h:422:27: sparse: got void [noderef] __iomem *dst

[ bp: Massage commit message. ]

Fixes: 0888e1030d3e ("x86/asm: Carve out a generic movdir64b() helper for general usage")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lkml.kernel.org/r/161003787823.4062451.6564503265464317197.stgit@djiang5-desk3.ch.intel.com


# 7f5933f8 05-Oct-2020 Dave Jiang <dave.jiang@intel.com>

x86/asm: Add an enqcmds() wrapper for the ENQCMDS instruction

Currently, the MOVDIR64B instruction is used to atomically submit
64-byte work descriptors to devices. Although it can encounter errors
like device queue full, command not accepted, device not ready, etc when
writing to a device MMIO, MOVDIR64B can not report back on errors from
the device itself. This means that MOVDIR64B users need to separately
interact with a device to see if a descriptor was successfully queued,
which slows down device interactions.

ENQCMD and ENQCMDS also atomically submit 64-byte work descriptors
to devices. But, they *can* report back errors directly from the
device, such as if the device was busy, or device not enabled or does
not support the command. This immediate feedback from the submission
instruction itself reduces the number of interactions with the device
and can greatly increase efficiency.

ENQCMD can be used at any privilege level, but can effectively only
submit work on behalf of the current process. ENQCMDS is a ring0-only
instruction and can explicitly specify a process context instead of
being tied to the current process or needing to reprogram the IA32_PASID
MSR.

Use ENQCMDS for work submission within the kernel because a Process
Address ID (PASID) is setup to translate the kernel virtual address
space. This PASID is provided to ENQCMDS from the descriptor structure
submitted to the device and not retrieved from IA32_PASID MSR, which is
setup for the current user address space.

See Intel Software Developer’s Manual for more information on the
instructions.

[ bp:
- Make operand constraints like movdir64b() because both insns are
basically doing the same thing, more or less.
- Fixup comments and cleanup. ]

Link: https://lkml.kernel.org/r/20200924180041.34056-3-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20201005151126.657029-3-dave.jiang@intel.com


# 0888e103 05-Oct-2020 Dave Jiang <dave.jiang@intel.com>

x86/asm: Carve out a generic movdir64b() helper for general usage

Carve out the MOVDIR64B inline asm primitive into a generic helper so
that it can be used by other functions. Move it to special_insns.h and
have iosubmit_cmds512() call it.

[ bp: Massage commit message. ]

Suggested-by: Michael Matz <matz@suse.de>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201005151126.657029-2-dave.jiang@intel.com


# aa5cacdc 02-Sep-2020 Arvind Sankar <nivedita@alum.mit.edu>

x86/asm: Replace __force_order with a memory clobber

The CRn accessor functions use __force_order as a dummy operand to
prevent the compiler from reordering CRn reads/writes with respect to
each other.

The fact that the asm is volatile should be enough to prevent this:
volatile asm statements should be executed in program order. However GCC
4.9.x and 5.x have a bug that might result in reordering. This was fixed
in 8.1, 7.3 and 6.5. Versions prior to these, including 5.x and 4.9.x,
may reorder volatile asm statements with respect to each other.

There are some issues with __force_order as implemented:
- It is used only as an input operand for the write functions, and hence
doesn't do anything additional to prevent reordering writes.
- It allows memory accesses to be cached/reordered across write
functions, but CRn writes affect the semantics of memory accesses, so
this could be dangerous.
- __force_order is not actually defined in the kernel proper, but the
LLVM toolchain can in some cases require a definition: LLVM (as well
as GCC 4.9) requires it for PIE code, which is why the compressed
kernel has a definition, but also the clang integrated assembler may
consider the address of __force_order to be significant, resulting in
a reference that requires a definition.

Fix this by:
- Using a memory clobber for the write functions to additionally prevent
caching/reordering memory accesses across CRn writes.
- Using a dummy input operand with an arbitrary constant address for the
read functions, instead of a global variable. This will prevent reads
from being reordered across writes, while allowing memory loads to be
cached/reordered across CRn reads, which should be safe.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602
Link: https://lore.kernel.org/lkml/20200527135329.1172644-1-arnd@arndb.de/
Link: https://lkml.kernel.org/r/20200902232152.3709896-1-nivedita@alum.mit.edu


# bf9c912f 06-Aug-2020 Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

x86/cpu: Use SERIALIZE in sync_core() when available

The SERIALIZE instruction gives software a way to force the processor to
complete all modifications to flags, registers and memory from previous
instructions and drain all buffered writes to memory before the next
instruction is fetched and executed. Thus, it serves the purpose of
sync_core(). Use it when available.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200807032833.17484-1-ricardo.neri-calderon@linux.intel.com


# f69ca629 26-Jul-2020 Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

x86/cpu: Refactor sync_core() for readability

Instead of having #ifdef/#endif blocks inside sync_core() for X86_64 and
X86_32, implement the new function iret_to_self() with two versions.

In this manner, avoid having to use even more more #ifdef/#endif blocks
when adding support for SERIALIZE in sync_core().

Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200727043132.15082-4-ricardo.neri-calderon@linux.intel.com


# 2823e83a 03-Jun-2020 Peter Zijlstra <peterz@infradead.org>

x86/entry: __always_inline CR2 for noinstr

vmlinux.o: warning: objtool: exc_page_fault()+0x9: call to read_cr2() leaves .noinstr.text section
vmlinux.o: warning: objtool: exc_page_fault()+0x24: call to prefetchw() leaves .noinstr.text section
vmlinux.o: warning: objtool: exc_page_fault()+0x21: call to kvm_handle_async_pf.isra.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: exc_nmi()+0x1cc: call to write_cr2() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200603114052.243227806@infradead.org


# 410367e3 04-Mar-2020 Thomas Gleixner <tglx@linutronix.de>

x86/entry: Disable interrupts for native_load_gs_index() in C code

There is absolutely no point in doing this in ASM code. Move it to C.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134903.531534675@linutronix.de


# 83b584d9 15-Jul-2019 Andrew Cooper <andrew.cooper3@citrix.com>

x86/paravirt: Drop {read,write}_cr8() hooks

There is a lot of infrastructure for functionality which is used
exclusively in __{save,restore}_processor_state() on the suspend/resume
path.

cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by
lapic_{suspend,resume}(). Saving and restoring cr8 independently of the
rest of the Local APIC state isn't a clever thing to be doing.

Delete the suspend/resume cr8 handling, which shrinks the size of struct
saved_context, and allows for the removal of both PVOPS.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/20190715151641.29210-1-andrew.cooper3@citrix.com


# 7652ac92 10-Jul-2019 Thomas Gleixner <tglx@linutronix.de>

x86/asm: Move native_write_cr0/4() out of line

The pinning of sensitive CR0 and CR4 bits caused a boot crash when loading
the kvm_intel module on a kernel compiled with CONFIG_PARAVIRT=n.

The reason is that the static key which controls the pinning is marked RO
after init. The kvm_intel module contains a CR4 write which requires to
update the static key entry list. That obviously does not work when the key
is in a RO section.

With CONFIG_PARAVIRT enabled this does not happen because the CR4 write
uses the paravirt indirection and the actual write function is built in.

As the key is intended to be immutable after init, move
native_write_cr0/4() out of line.

While at it consolidate the update of the cr4 shadow variable and store the
value right away when the pinning is initialized on a booting CPU. No point
in reading it back 20 instructions later. This allows to confine the static
key and the pinning variable to cpu/common and allows to mark them static.

Fixes: 8dbec27a242c ("x86/asm: Pin sensitive CR0 bits")
Fixes: 873d50d58f67 ("x86/asm: Pin sensitive CR4 bits")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Xi Ruoyao <xry111@mengyan1223.wang>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Xi Ruoyao <xry111@mengyan1223.wang>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907102140340.1758@nanos.tec.linutronix.de


# 8dbec27a 17-Jun-2019 Kees Cook <keescook@chromium.org>

x86/asm: Pin sensitive CR0 bits

With sensitive CR4 bits pinned now, it's possible that the WP bit for
CR0 might become a target as well.

Following the same reasoning for the CR4 pinning, pin CR0's WP
bit. Contrary to the cpu feature dependend CR4 pinning this can be done
with a constant value.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: https://lkml.kernel.org/r/20190618045503.39105-4-keescook@chromium.org


# 873d50d5 17-Jun-2019 Kees Cook <keescook@chromium.org>

x86/asm: Pin sensitive CR4 bits

Several recent exploits have used direct calls to the native_write_cr4()
function to disable SMEP and SMAP before then continuing their exploits
using userspace memory access.

Direct calls of this form can be mitigate by pinning bits of CR4 so that
they cannot be changed through a common function. This is not intended to
be a general ROP protection (which would require CFI to defend against
properly), but rather a way to avoid trivial direct function calling (or
CFI bypasses via a matching function prototype) as seen in:

https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html
(https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-7308)

The goals of this change:

- Pin specific bits (SMEP, SMAP, and UMIP) when writing CR4.

- Avoid setting the bits too early (they must become pinned only after
CPU feature detection and selection has finished).

- Pinning mask needs to be read-only during normal runtime.

- Pinning needs to be checked after write to validate the cr4 state

Using __ro_after_init on the mask is done so it can't be first disabled
with a malicious write.

Since these bits are global state (once established by the boot CPU and
kernel boot parameters), they are safe to write to secondary CPUs before
those CPUs have finished feature detection. As such, the bits are set at
the first cr4 write, so that cr4 write bugs can be detected (instead of
silently papered over). This uses a few bytes less storage of a location we
don't have: read-only per-CPU data.

A check is performed after the register write because an attack could just
skip directly to the register write. Such a direct jump is possible because
of how this function may be built by the compiler (especially due to the
removal of frame pointers) where it doesn't add a stack frame (function
exit may only be a retq without pops) which is sufficient for trivial
exploitation like in the timer overwrites mentioned above).

The asm argument constraints gain the "+" modifier to convince the compiler
that it shouldn't make ordering assumptions about the arguments or memory,
and treat them as changed.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: https://lkml.kernel.org/r/20190618045503.39105-3-keescook@chromium.org


# 577ff465 03-Apr-2019 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

x86/fpu: Only write PKRU if it is different from current

According to Dave Hansen, WRPKRU is more expensive than RDPKRU. It has
a higher cycle cost and it's also practically a (light) speculation
barrier.

As an optimisation read the current PKRU value and only write the new
one if it is different.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190403164156.19645-14-bigeasy@linutronix.de


# c806e887 03-Apr-2019 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

x86/pkeys: Provide *pkru() helpers

Dave Hansen asked for __read_pkru() and __write_pkru() to be
symmetrical.

As part of the series __write_pkru() will read back the value and only
write it if it is different.

In order to make both functions symmetrical, move the function
containing only the opcode asm into a function called like the
instruction itself.

__write_pkru() will just invoke wrpkru() but in a follow-up patch will
also read back the value.

[ bp: Convert asm opcode wrapper names to rd/wrpkru(). ]

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190403164156.19645-13-bigeasy@linutronix.de


# fdc0269e 28-Aug-2018 Juergen Gross <jgross@suse.com>

x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella

Most of the paravirt ops defined in pv_mmu_ops are for Xen PV guests
only. Define them only if CONFIG_PARAVIRT_XXL is set.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: virtualization@lists.linux-foundation.org
Cc: akataria@vmware.com
Cc: rusty@rustcorp.com.au
Cc: boris.ostrovsky@oracle.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20180828074026.820-15-jgross@suse.com


# 9bad5658 28-Aug-2018 Juergen Gross <jgross@suse.com>

x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella

Most of the paravirt ops defined in pv_cpu_ops are for Xen PV guests
only. Define them only if CONFIG_PARAVIRT_XXL is set.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: virtualization@lists.linux-foundation.org
Cc: akataria@vmware.com
Cc: rusty@rustcorp.com.au
Cc: boris.ostrovsky@oracle.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20180828074026.820-13-jgross@suse.com


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 87930019 03-Sep-2017 Juergen Gross <jgross@suse.com>

x86/paravirt: Remove no longer used paravirt functions

With removal of lguest some of the paravirt functions are no longer
needed:

->read_cr4()
->store_idt()
->set_pmd_at()
->set_pud_at()
->pte_update()

Remove them.

Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6c690ee1 12-Jun-2017 Andy Lutomirski <luto@kernel.org>

x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3()

The kernel has several code paths that read CR3. Most of them assume that
CR3 contains the PGD's physical address, whereas some of them awkwardly
use PHYSICAL_PAGE_MASK to mask off low bits.

Add explicit mask macros for CR3 and convert all of the CR3 readers.
This will keep them from breaking when PCID is enabled.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# af25ed59 31-Oct-2016 Andy Lutomirski <luto@kernel.org>

x86/fpu: Remove clts()

The kernel doesn't use clts() any more. Remove it and all of its
paravirt infrastructure.

A careful reader may notice that xen_clts() appears to have been
buggy -- it didn't update xen_cr0_value.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm list <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/3d3c8ca62f17579b9849a013d71e59a4d5d1b079.1477951965.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0d50612c 31-Oct-2016 Andy Lutomirski <luto@kernel.org>

x86/fpu: Remove stts()

It has no callers any more, and it was always a bit confusing, as
there is no STTS instruction.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm list <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/04247401710b230849e58bf2112ce4fd0b9840e1.1477951965.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1ef55be1 29-Sep-2016 Andy Lutomirski <luto@kernel.org>

x86/asm: Get rid of __read_cr4_safe()

We use __read_cr4() vs __read_cr4_safe() inconsistently. On
CR4-less CPUs, all CR4 bits are effectively clear, so we can make
the code simpler and more robust by making __read_cr4() always fix
up faults on 32-bit kernels.

This may fix some bugs on old 486-like CPUs, but I don't have any
easy way to test that.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: david@saggiorato.net
Link: http://lkml.kernel.org/r/ea647033d357d9ce2ad2bbde5a631045f5052fb6.1475178370.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# fd1d961d 02-Jun-2016 Dan Williams <dan.j.williams@intel.com>

x86/insn: remove pcommit

The pcommit instruction is being deprecated in favor of either ADR
(asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
Firmware Interface Table).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 9e90199c 22-Mar-2016 Xiao Guangrong <guangrong.xiao@linux.intel.com>

x86: pkey: introduce write_pkru() for KVM

KVM will use it to switch pkru between guest and host.

CC: Ingo Molnar <mingo@redhat.com>
CC: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# a927cb83 12-Feb-2016 Dave Hansen <dave.hansen@linux.intel.com>

x86/mm/pkeys: Add functions to fetch PKRU

This adds the raw instruction to access PKRU as well as some
accessor functions that correctly handle when the CPU does not
support the instruction. We don't use it here, but we will use
read_pkru() in the next patch.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210215.15238D34@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ca7d9b79 11-May-2015 Ross Zwisler <zwisler@kernel.org>

x86/mm: Add kerneldoc comments for pcommit_sfence()

Add kerneldoc comments for pcommit_sfence() describing the
purpose of the PCOMMIT instruction and demonstrating its usage
with an example.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H Peter Anvin <h.peter.anvin@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/1430261196-2401-1-git-send-email-ross.zwisler@linux.intel.com
Link: http://lkml.kernel.org/r/1431332153-18566-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d9dc64f3 27-Jan-2015 Ross Zwisler <zwisler@kernel.org>

x86/asm: Add support for the CLWB instruction

Add support for the new CLWB (cache line write back)
instruction. This instruction was announced in the document
"Intel Architecture Instruction Set Extensions Programming
Reference" with reference number 319433-022.

https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

The CLWB instruction is used to write back the contents of
dirtied cache lines to memory without evicting the cache lines
from the processor's cache hierarchy. This should be used in
favor of clflushopt or clflush in cases where you require the
cache line to be written to memory but plan to access the data
again in the near future.

One of the main use cases for this is with persistent memory
where CLWB can be used with PCOMMIT to ensure that data has been
accepted to memory and is durable on the DIMM.

This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH
and PCOMMIT with appropriate fencing:

void flush_and_commit_buffer(void *vaddr, unsigned int size)
{
void *vend = vaddr + size - 1;

for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
clwb(vaddr);

/* Flush any possible final partial cacheline */
clwb(vend);

/*
* Use SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes.
* (MFENCE via mb() also works)
*/
wmb();

/* PCOMMIT and the required SFENCE for ordering */
pcommit_sfence();
}

After this function completes the data pointed to by vaddr is
has been accepted to memory and will be durable if the vaddr
points to persistent memory.

Regarding the details of how the alternatives assembly is set
up, we need one additional byte at the beginning of the CLFLUSH
so that we can flip it into a CLFLUSHOPT by changing that byte
into a 0x66 prefix. Two options are to either insert a 1 byte
ASM_NOP1, or to add a 1 byte NOP_DS_PREFIX. Both have no
functional effect with the plain CLFLUSH, but I've been told
that executing a CLFLUSH + prefix should be faster than
executing a CLFLUSH + NOP.

We had to hard code the assembly for CLWB because, lacking the
ability to assemble the CLWB instruction itself, the next
closest thing is to have an xsaveopt instruction with a 0x66
prefix. Unfortunately XSAVEOPT itself is also relatively new,
and isn't included by all the GCC versions that the kernel needs
to support.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1422377631-8986-3-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 719d359d 19-Feb-2015 Ross Zwisler <zwisler@kernel.org>

x86/asm: Add support for the pcommit instruction

Add support for the new pcommit (persistent commit) instruction.
This instruction was announced in the document "Intel
Architecture Instruction Set Extensions Programming Reference"
with reference number 319433-022:

https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

The pcommit instruction ensures that data that has been flushed
from the processor's cache hierarchy with clwb, clflushopt or
clflush is accepted to memory and is durable on the DIMM. The
primary use case for this is persistent memory.

This function shows how to properly use clwb/clflushopt/clflush
and pcommit with appropriate fencing:

void flush_and_commit_buffer(void *vaddr, unsigned int size)
{
void *vend = vaddr + size - 1;

for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
clwb(vaddr);

/* Flush any possible final partial cacheline */
clwb(vend);

/*
* sfence to order clwb/clflushopt/clflush cache flushes
* mfence via mb() also works
*/
wmb();

/* pcommit and the required sfence for ordering */
pcommit_sfence();
}

After this function completes the data pointed to by vaddr is
has been accepted to memory and will be durable if the vaddr
points to persistent memory.

Pcommit must always be ordered by an mfence or sfence, so to
help simplify things we include both the pcommit and the
required sfence in the alternatives generated by
pcommit_sfence(). The other option is to keep them separated,
but on platforms that don't support pcommit this would then turn
into:

void flush_and_commit_buffer(void *vaddr, unsigned int size)
{
void *vend = vaddr + size - 1;

for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
clwb(vaddr);

/* Flush any possible final partial cacheline */
clwb(vend);

/*
* sfence to order clwb/clflushopt/clflush cache flushes
* mfence via mb() also works
*/
wmb();

nop(); /* from pcommit(), via alternatives */

/*
* sfence to order pcommit
* mfence via mb() also works
*/
wmb();
}

This is still correct, but now you've got two fences separated
by only a nop. With the commit and the fence together in
pcommit_sfence() you avoid the final unneeded fence.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1424367448-24254-1-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1e02ce4c 24-Oct-2014 Andy Lutomirski <luto@amacapital.net>

x86: Store a per-cpu shadow copy of CR4

Context switches and TLB flushes can change individual bits of CR4.
CR4 reads take several cycles, so store a shadow copy of CR4 in a
per-cpu variable.

To avoid wasting a cache line, I added the CR4 shadow to
cpu_tlbstate, which is already touched in switch_mm. The heaviest
users of the cr4 shadow will be switch_mm and __switch_to_xtra, and
__switch_to_xtra is called shortly after switch_mm during context
switch, so the cacheline is likely to be hot.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 171699f7 26-Feb-2014 Ross Zwisler <zwisler@kernel.org>

x86: Add support for the clflushopt instruction

Add support for the new clflushopt instruction. This instruction was
announced in the document "Intel Architecture Instruction Set Extensions
Programming Reference" with Ref # 319433-018.

http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf

[ hpa: changed the feature flag to simply X86_FEATURE_CLFLUSHOPT - if
that is what we want to report in /proc/cpuinfo anyway... ]

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Link: http://lkml.kernel.org/r/1393441612-19729-2-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 277d5b40 05-Aug-2013 Andi Kleen <ak@linux.intel.com>

x86, asmlinkage: Make several variables used from assembler/linker script visible

Plus one function, load_gs_index().

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-10-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 1d10f6ee 29-May-2013 Jan Beulich <JBeulich@suse.com>

x86: __force_order doesn't need to be an actual variable

It being static causes over a dozen instances to be scattered
across the kernel image, with non of them ever being referenced
in any way. Making the variable extern without ever defining it
works as well - all we need is to have the compiler think the
variable is being accessed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/51A610B802000078000D99A0@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f05e798a 28-Mar-2012 David Howells <dhowells@redhat.com>

Disintegrate asm/system.h for X86

Disintegrate asm/system.h for X86.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
cc: x86@kernel.org