History log of /freebsd-current/sys/x86/include/x86_var.h
Revision Date Author Comments
# c6113ac5 13-May-2024 Konstantin Belousov <kib@FreeBSD.org>

AMD CPUs: update bits and data from CPUID 0x8000_0008

from AMD APM vol3 doc no 24594 Rev. 3.36 March 2024

Reviewed and tested by: emaste
Sponsored by: Advanced Micro Devices (AMD)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D45188


# d63ea036 04-Jan-2024 Mark Johnston <markj@FreeBSD.org>

x86: Make cpu_model[] public

No functional change intended.

Reviewed by: emaste, imp, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D43281


# ebaea1bc 11-Sep-2023 Olivier Certner <olce.freebsd@certner.fr>

x86: AMD Zen2: Zenbleed chicken bit mitigation

Applies only to bare-metal Zen2 processors. The system currently
automatically applies it to all of them.

Tunable/sysctl 'machdep.mitigations.zenbleed.enable' can be used to
forcibly enable or disable the mitigation at boot or run-time. Possible
values are:

0: Mitigation disabled
1: Mitigation enabled
2: Run the automatic determination.

Currently, value 2 is the default and has identical effect as value 1.
This might change in the future if we choose to take into account
microcode revisions in the automatic determination process.

The tunable/sysctl value is simply ignored on non-applicable CPU models,
which is useful to apply the same configuration on a set of machines
that do not all have Zen2 processors. Trying to set it to any integer
value not listed above is silently equivalent to setting it to value 2
(automatic determination).

The current mitigation state can be queried through sysctl
'machdep.mitigations.zenbleed.state', which returns "Not applicable",
"Mitigation enabled" or "Mitigation disabled". Note that this state is
not guaranteed to be accurate in case of intervening modifications of
the corresponding chicken bit directly via cpuctl(4) (this includes the
cpucontrol(8) utility). Resetting the desired policy through
'machdep.mitigations.zenbleed.enable' (possibly to its current value)
will reset the hardware state and ensure that the reported state is
again coherent with it.

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41817


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 5c321467 01-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

amd64: Eliminate write only cpu_fxsr.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D38289
MFC after: 1 week


# fd25c622 07-Sep-2022 Konstantin Belousov <kib@FreeBSD.org>

i386: check that trap() and syscall() run on the thread kstack

and not on the trampoline stack. This is a useful way to ensure that
we did not enabled interrupts while on user %cr3 or trampoline stack.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 050f5a84 29-Jun-2022 Dmitry Chagin <dchagin@FreeBSD.org>

amd64: Reload CPU ext features after resume or cr4 changes

Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D35555
MFC after: 2 weeks


# fe2c9f83 26-Apr-2022 Dmitry Chagin <dchagin@FreeBSD.org>

Remove dead code.

is_physical_memory() dead since 235a54de.

Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35056
MFC after: 2 weeks


# 254e4e5b 28-Dec-2021 John Baldwin <jhb@FreeBSD.org>

Simplify swi for bus_dma.

When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.

Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.

One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.

Reviewed by: imp, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33447


# 1adebe3c 17-Nov-2021 Mitchell Horne <mhorne@FreeBSD.org>

minidump: Parameterize minidumpsys()

The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.

This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.

Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.

The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.

Reviewed by: kib, markj, jhb
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31989


# 652ae7b1 28-Jul-2021 Adam Fenn <adam@fenn.io>

x86: cpufunc: Add rdtsc_ordered()

Add a variant of 'rdtsc()' that performs the ordered version of 'rdtsc'
appropriate for the invoking x86 variant.

Also, expose the 'lfence'-ed and 'mfence'-ed 'rdtsc()' variants needed
by 'rdtsc_ordered()' for general use.

Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31416


# de8374df 12-Aug-2021 Dmitry Chagin <dchagin@FreeBSD.org>

fork: Allow ABI to specify fork return values for child.

At least Linux x86 ABI's does not use carry bit and expects that the dx register
is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork().

Add a short comment about touching dx in x86_set_fork_retval(), for more details
see phab comments from kib@ and imp@.

Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D31472
MFC after: 2 weeks


# d0bc4b46 02-Aug-2021 Konstantin Belousov <kib@FreeBSD.org>

x86_msr_op: extend the KPI to allow MSR read and single-CPU operations

Reivewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31386


# a8b75a57 09-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

x86: add x86_clear_dbregs() helper

Move the code from exec_setregs() to reset debug registers state on exec,
to the x86_clear_dbregs() helper

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29687


# 15dc1d44 19-Feb-2021 Mitchell Horne <mhorne@FreeBSD.org>

x86: implement kdb watchpoint functions

Add wrappers around the dbreg interface that can be consumed by MI
kernel debugger code. The dbreg functions themselves are updated to
return error codes, not just -1. dbreg_set_watchpoint() is extended to
accept access bits as an argument.

Reviewed by: jhb, kib, markj
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D29155


# c02c04f1 19-Mar-2021 Mitchell Horne <mhorne@FreeBSD.org>

x86: consolidate hw watchpoint logic into new file

This is a prerequisite to using these functions outside of ddb, but also
provides some cleanup and minor refactoring. This code is almost
entirely duplicated between the two implementations, the only
significant difference being the lack of dbreg synchronization on i386.

Cleanups are:
- demote some internal functions to static
- use the constant NDBREGS instead of a '4' literal
- remove K&R definitions
- some added comments

Reviewed by: kib, jhb
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29153


# a2495c36 08-Feb-2021 Roger Pau Monné <royger@FreeBSD.org>

xen/boot: allow specifying boot method when booted from Xen

Allow setting the bootmethod variable from the Xen PVH entry point, in
order to be able to correctly set the underlying firmware mode when
booted as a dom0.

Move the bootmethod variable to be defined in x86/cpu_machdep.c
instead so it can be shared by both i386 and amd64.

Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D28619


# d3ba71b2 14-Oct-2020 Konstantin Belousov <kib@FreeBSD.org>

Limit workaround for errata E400 to appropriate AMD cpus.

From Linux sources and several datasheets I looked at, it seems that
the workaround is only needed on families 0xf and 0x10. For instance,
Ryzens do not implement the accessed MSR at all, it is documented as
reserved. Also, hypervisors should not allow guest to put CPU into
idle state, so activate workaround only when on bare hardware.

While there, style the code:
move MSR defines to specialreg.h
move identification to initcpu.c

Reported by: whu
Reviewed by: avg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26470


# 5e8ea68f 03-Oct-2020 Konstantin Belousov <kib@FreeBSD.org>

Move ctx_switch_xsave declaration to amd64 md_var.h.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# ab041f71 21-Sep-2020 D Scott Phillips <scottph@FreeBSD.org>

Move vm_page_dump bitset array definition to MI code

These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.

Reviewed by: kib, markj
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26129


# 3a3f1e9d 18-Aug-2020 Peter Grehan <grehan@FreeBSD.org>

Export a routine to provide the TSC_AUX MSR value and use this in vmm.

Also, drop an unnecessary set of braces.

Requested by: kib
Reviewed by: kib
MFC after: 3 weeks


# 17edf152 12-Jun-2020 Konstantin Belousov <kib@FreeBSD.org>

Control for Special Register Buffer Data Sampling mitigation.

New microcode update for Intel enables mitigation for SRBDS, which
slows down RDSEED and related instructions. The update also provides
a control to limit the mitigation to SGX enclaves, which should
restore the speed of random generator by the cost of potential
cross-core bufer sampling.

See https://software.intel.com/security-software-guidance/insights/deep-dive-special-register-buffer-data-sampling

GIve the user control over it.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25221


# ea602083 20-May-2020 Konstantin Belousov <kib@FreeBSD.org>

amd64: Add a knob to flush RSB on context switches if machine has SMEP.

The flush is needed to prevent cross-process ret2spec, which is not handled
on kernel entry if IBPB is enabled but SMEP is present.
While there, add i386 RSB flush.

Reported by: Anthony Steinhauser <asteinhauser@google.com>
Reviewed by: markj, Anthony Steinhauser
Discussed with: philip
admbugs: 961
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# a324b7f7 25-Feb-2020 Konstantin Belousov <kib@FreeBSD.org>

Fix IBRS for machines with IBRS_ALL capability.

When turning IBRS mitigation using sysctl, as opposed to loader tunable,
send IPI to tweak MSR on all cores. Right now code only performed MSR write
onr the CPU where sysctl was run.

Properly report hw.ibrs_active for IBRS_ALL. Split hw_ibrs_ibpb_active out
from ibrs_active, to keep the current semantic of guiding kernel entry and
exit handlers.

Reported and tested by: mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# fa83f689 18-Nov-2019 Konstantin Belousov <kib@FreeBSD.org>

Add x86 msr tweak KPI.

Use the KPI to tweak MSRs in mitigation code.

Reviewed by: markj, scottl
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22431


# e3721601 15-Nov-2019 Scott Long <scottl@FreeBSD.org>

TSX Asynchronous Abort mitigation for Intel CVE-2019-11135.
This CVE has already been announced in FreeBSD SA-19:26.mcu.

Mitigation for TAA involves either turning off TSX or turning on the
VERW mitigation used for MDS. Some CPUs will also be self-mitigating
for TAA and require no software workaround.

Control knobs are:
machdep.mitigations.taa.enable:
0 - no software mitigation is enabled
1 - attempt to disable TSX
2 - use the VERW mitigation
3 - automatically select the mitigation based on processor
features.

machdep.mitigations.taa.state:
inactive - no mitigation is active/enabled
TSX disable - TSX is disabled in the bare metal CPU as well as
- any virtualized CPUs
VERW - VERW instruction clears CPU buffers
not vulnerable - The CPU has identified itself as not being
vulnerable

Nothing in the base FreeBSD system uses TSX. However, the instructions
are straight-forward to add to custom applications and require no kernel
support, so the mitigation is provided for users with untrusted
applications and tenants.

Reviewed by: emaste, imp, kib, scottph
Sponsored by: Intel
Differential Revision: 22374


# bb044eaf 17-Oct-2019 Conrad Meyer <cem@FreeBSD.org>

x86: Fetch and save standard CPUID leaf 6 in identcpu

Rather than a few scattered places in the tree. Organize flag names in a
contiguous region of specialreg.h.

While here, delete deprecated PCOMMIT from leaf 7.

No functional change.


# 949f834a 17-May-2019 Stephen J. Kiernan <stevek@FreeBSD.org>

Instead of individual conditional statements to look for each hypervisor
type, use a table to make it easier to add more in the future, if needed.

Add VirtualBox detection to the table ("VBoxVBoxVBox" is the hypervisor
vendor string to look for.) Also add VM_GUEST_VBOX to the VM_GUEST
enumeration to indicate VirtualBox.

Save the CPUID base for the hypervisor entry that we detected. Driver code
may need to know about it in order to obtain additional CPUID features.

Approved by: bryanv, jhb
Differential Revision: https://reviews.freebsd.org/D16305


# 7355a02b 14-May-2019 Konstantin Belousov <kib@FreeBSD.org>

Mitigations for Microarchitectural Data Sampling.

Microarchitectural buffers on some Intel processors utilizing
speculative execution may allow a local process to obtain a memory
disclosure. An attacker may be able to read secret data from the
kernel or from a process when executing untrusted code (for example,
in a web browser).

Reference: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00233.html
Security: CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091
Security: FreeBSD-SA-19:07.mds
Reviewed by: jhb
Tested by: emaste, lwhsu
Approved by: so (gtetlow)


# eb785fab 06-Feb-2019 Konstantin Belousov <kib@FreeBSD.org>

Port sysctl kern.elf32.read_exec from amd64 to i386.

Make it more comprehensive on i386, by not setting nx bit for any
mapping, not just adding PF_X to all kernel-loaded ELF segments. This
is needed for the compatibility with older i386 programs that assume
that read access implies exec, e.g. old X servers with hand-rolled
module loader.

Reported and tested by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 9a527560 29-Jan-2019 Konstantin Belousov <kib@FreeBSD.org>

i386: Merge PAE and non-PAE pmaps into same kernel.

Effectively all i386 kernels now have two pmaps compiled in: one
managing PAE pagetables, and another non-PAE. The implementation is
selected at cold time depending on the CPU features. The vm_paddr_t is
always 64bit now. As result, nx bit can be used on all capable CPUs.

Option PAE only affects the bus_addr_t: it is still 32bit for non-PAE
configs, for drivers compatibility. Kernel layout, esp. max kernel
address, low memory PDEs and max user address (same as trampoline
start) are now same for PAE and for non-PAE regardless of the type of
page tables used.

Non-PAE kernel (when using PAE pagetables) can handle physical memory
up to 24G now, larger memory requires re-tuning the KVA consumers and
instead the code caps the maximum at 24G. Unfortunately, a lot of
drivers do not use busdma(9) properly so by default even 4G barrier is
not easy. There are two tunables added: hw.above4g_allow and
hw.above24g_allow, the first one is kept enabled for now to evaluate
the status on HEAD, second is only for dev use.

i386 now creates three freelists if there is any memory above 4G, to
allow proper bounce pages allocation. Also, VM_KMEM_SIZE_SCALE changed
from 3 to 1.

The PAE_TABLES kernel config option is retired.

In collaboarion with: pho
Discussed with: emaste
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D18894


# 83813c66 12-Nov-2018 Konstantin Belousov <kib@FreeBSD.org>

Apply fix to un-cripple max cpu id on BSP earlier.

We need to know actual value for the standard extended features before
ifuncs are resolved.

Reported and tested by: madpilot
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 7705dd4d 25-Jun-2018 Konstantin Belousov <kib@FreeBSD.org>

Provide a helper function acpi_get_fadt_bootflags() to fetch the FADT
x86 boot flags.

Reviewed by: royger
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D16004
MFC after: 1 week


# 9e2154ff 21-May-2018 John Baldwin <jhb@FreeBSD.org>

Cleanups related to debug exceptions on x86.

- Add constants for fields in DR6 and the reserved fields in DR7. Use
these constants instead of magic numbers in most places that use DR6
and DR7.
- Refer to T_TRCTRAP as "debug exception" rather than a "trace trap"
as it is not just for trace exceptions.
- Always read DR6 for debug exceptions and only clear TF in the flags
register for user exceptions where DR6.BS is set.
- Clear DR6 before returning from a debug exception handler as
recommended by the SDM dating all the way back to the 386. This
allows debuggers to determine the cause of each exception. For
kernel traps, clear DR6 in the T_TRCTRAP case and pass DR6 by value
to other parts of the handler (namely, user_dbreg_trap()). For user
traps, wait until after trapsignal to clear DR6 so that userland
debuggers can read DR6 via PT_GETDBREGS while the thread is stopped
in trapsignal().

Reviewed by: kib, rgrimes
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D15189


# 3621ba1e 21-May-2018 Konstantin Belousov <kib@FreeBSD.org>

Add Intel Spec Store Bypass Disable control.

Speculative Store Bypass (SSB) is a speculative execution side channel
vulnerability identified by Jann Horn of Google Project Zero (GPZ) and
Ken Johnson of the Microsoft Security Response Center (MSRC)
https://bugs.chromium.org/p/project-zero/issues/detail?id=1528.
Updated Intel microcode introduces a MSR bit to disable SSB as a
mitigation for the vulnerability.

Introduce a sysctl hw.spec_store_bypass_disable to provide global
control over the SSBD bit, akin to the existing sysctl that controls
IBRS. The sysctl can be set to one of three values:
0: off
1: on
2: auto

Future work will enable applications to control SSBD on a per-process
basis (when it is not enabled globally).

SSBD bit detection and control was verified with prerelease microcode.

Security: CVE-2018-3639
Tested by: emaste (previous version, without updated microcode)
Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 986c4ca3 30-Apr-2018 Konstantin Belousov <kib@FreeBSD.org>

Turn off IBRS on suspend.

Resume starts CPU from the init state, which clears any loaded
microcode updates. As result, IBRS MSRs are no longer available,
until the microcode is reloaded.

I have to forcibly clear cpu_stdext_feature3, which assumes that CPUID
leaf 7 reg %ebx does not report anything except Meltdown/Spectre bugs
bits. If future CPUs add new bits there, hw_ibrs_recalculate() and
identify_cpu1()/identify_cpu2() need to be adjusted for that.

Submitted and tested by: Michael Danilov <mike.d.ft402@gmail.com>
PR: 227866
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15236


# 8fbcc334 20-Mar-2018 Konstantin Belousov <kib@FreeBSD.org>

Move the CR0.WP manipulation KPI to x86.

This should allow to avoid some #ifdefs in the common x86/ code.

Requested by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 319117fd 31-Jan-2018 Konstantin Belousov <kib@FreeBSD.org>

IBRS support, AKA Spectre hardware mitigation.

It is coded according to the Intel document 336996-001, reading of the
patches posted on lkml, and some additional consultations with Intel.

For existing processors, you need a microcode update which adds IBRS
CPU features, and to manually enable it by setting the tunable/sysctl
hw.ibrs_disable to 0. Current status can be checked in sysctl
hw.ibrs_active. The mitigation might be inactive if the CPU feature
is not patched in, or if CPU reports that IBRS use is not required, by
IA32_ARCH_CAP_IBRS_ALL bit.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14029


# b3327f62 19-Jan-2018 Ed Maste <emaste@FreeBSD.org>

Enable KPTI by default on amd64 for non-AMD CPUs

Kernel Page Table Isolation (KPTI) was introduced in r328083 as a
mitigation for the 'Meltdown' vulnerability. AMD CPUs are not affected,
per https://www.amd.com/en/corporate/speculative-execution:

We believe AMD processors are not susceptible due to our use of
privilege level protections within paging architecture and no
mitigation is required.

Thus default KPTI to off for AMD CPUs, and to on for others. This may
be refined later as we obtain more specific information on the sets of
CPUs that are and are not affected.

Submitted by: Mitchell Horne
Reviewed by: cem
Relnotes: Yes
Security: CVE-2017-5754
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D13971


# bd50262f 17-Jan-2018 Konstantin Belousov <kib@FreeBSD.org>

PTI for amd64.

The implementation of the Kernel Page Table Isolation (KPTI) for
amd64, first version. It provides a workaround for the 'meltdown'
vulnerability. PTI is turned off by default for now, enable with the
loader tunable vm.pmap.pti=1.

The pmap page table is split into kernel-mode table and user-mode
table. Kernel-mode table is identical to the non-PTI table, while
usermode table is obtained from kernel table by leaving userspace
mappings intact, but only leaving the following parts of the kernel
mapped:

kernel text (but not modules text)
PCPU
GDT/IDT/user LDT/task structures
IST stacks for NMI and doublefault handlers.

Kernel switches to user page table before returning to usermode, and
restores full kernel page table on the entry. Initial kernel-mode
stack for PTI trampoline is allocated in PCPU, it is only 16
qwords. Kernel entry trampoline switches page tables. then the
hardware trap frame is copied to the normal kstack, and execution
continues.

IST stacks are kept mapped and no trampoline is needed for
NMI/doublefault, but of course page table switch is performed.

On return to usermode, the trampoline is used again, iret frame is
copied to the trampoline stack, page tables are switched and iretq is
executed. The case of iretq faulting due to the invalid usermode
context is tricky, since the frame for fault is appended to the
trampoline frame. Besides copying the fault frame and original
(corrupted) frame to kstack, the fault frame must be patched to make
it look as if the fault occured on the kstack, see the comment in
doret_iret detection code in trap().

Currently kernel pages which are mapped during trampoline operation
are identical for all pmaps. They are registered using
pmap_pti_add_kva(). Besides initial registrations done during boot,
LDT and non-common TSS segments are registered if user requested their
use. In principle, they can be installed into kernel page table per
pmap with some work. Similarly, PCPU can be hidden from userspace
mapping using trampoline PCPU page, but again I do not see much
benefits besides complexity.

PDPE pages for the kernel half of the user page tables are
pre-allocated during boot because we need to know pml4 entries which
are copied to the top-level paging structure page, in advance on a new
pmap creation. I enforce this to avoid iterating over the all
existing pmaps if a new PDPE page is needed for PTI kernel mappings.
The iteration is a known problematic operation on i386.

The need to flush hidden kernel translations on the switch to user
mode make global tables (PG_G) meaningless and even harming, so PG_G
use is disabled for PTI case. Our existing use of PCID is
incompatible with PTI and is automatically disabled if PTI is
enabled. PCID can be forced on only for developer's benefit.

MCE is known to be broken, it requires IST stack to operate completely
correctly even for non-PTI case, and absolutely needs dedicated IST
stack because MCE delivery while trampoline did not switched from PTI
stack is fatal. The fix is pending.

Reviewed by: markj (partially)
Tested by: pho (previous version)
Discussed with: jeff, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# e8c770a6 13-Jan-2018 Konstantin Belousov <kib@FreeBSD.org>

Enumerate and print Intel CPU features for Speculative Execution Side
Channel Mitigations.

The definitions are taken from the document 336996-001.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 0530a936 05-Jan-2018 Konstantin Belousov <kib@FreeBSD.org>

Make it possible to re-evaluate cpu_features.

Add cpuctl(4) ioctl CPUCTL_EVAL_CPU_FEATURES which forces re-read of
cpu_features, cpu_features2, cpu_stdext_features, and
std_stdext_features2.

The intent is to allow the kernel to see the changes in the CPU
features after micocode update. Of course, the update is not atomic
across variables and not synchronized with readers. See the man page
warning as well.

Reviewed by: imp (previous version), jilles
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13770


# 194446f9 20-Sep-2017 Conrad Meyer <cem@FreeBSD.org>

x86: Decode AMD "Extended Feature Extensions ID EBX" bits

In particular, this determines CPU support for the CLZERO instruction.

(No, I am not making this name up.)

Sponsored by: Dell EMC Isilon


# cd8c2581 07-Sep-2017 Conrad Meyer <cem@FreeBSD.org>

Store AMD RAS Capabilities cpuid value and name flags

Reviewed by: truckman
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12237


# fd1f83fb 10-Aug-2017 Roger Pau Monné <royger@FreeBSD.org>

apic_enumerator: only set mp_ncpus and mp_maxid at probe cpus phase

Populate the lapics arrays and call cpu_add/lapic_create in the setup
phase instead. Also store the max APIC ID found in the newly
introduced max_apic_id global variable.

This is a requirement in order to make the static arrays currently
using MAX_LAPIC_ID dynamic.

Sponsored by: Citrix Systems R&D
MFC after: 1 month
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D11911


# b5669d0a 09-Aug-2017 Jung-uk Kim <jkim@FreeBSD.org>

Split identify_cpu() into two functions for amd64 as we do for i386. This
reduces diff between amd64 and i386. Also, it fixes a regression introduced
in r322076, i.e., identify_hypervisor() failed to identify some hypervisors.
This function assumes cpu_feature2 is already initialized.

Reported by: dexuan
Tested by: dexuan


# 01050344 05-Aug-2017 Jung-uk Kim <jkim@FreeBSD.org>

Detect hypervisors early. We used to set lower hz on hypervisors by default
but it was broken since r273800 (and r278522, its MFC to stable/10) because
identify_cpu() is called too late, i.e., after init_param1().

MFC after: 3 days


# 16dcd773 27-Oct-2016 John Baldwin <jhb@FreeBSD.org>

MFamd64: Add bounds checks on addresses used with /dev/mem.

Reject attempts to read from or memory map offsets in /dev/mem that are
beyond the maximum-supported physical address of the current CPU.

Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D7408


# 295f4b6c 24-Oct-2016 Konstantin Belousov <kib@FreeBSD.org>

Follow-up to r307866:
- Make !KDB config buildable.
- Simplify interface to nmi_handle_intr() by evaluating panic_on_nmi
in one place, namely nmi_call_kdb(). This allows to remove do_panic
argument from the functions, and to remove i386/amd64 duplication of
the variable and sysctl definitions. Note that now NMI causes
panic(9) instead of trap_fatal() reporting and then panic(9),
consistently for NMIs delivered while CPU operated in ring 0 and 3.

Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 835c2787 24-Oct-2016 Konstantin Belousov <kib@FreeBSD.org>

Handle broadcast NMIs.

On several Intel chipsets, diagnostic NMIs sent from BMC or NMIs
reporting hardware errors are broadcasted to all CPUs.

When kernel is configured to enter kdb on NMI, the outcome is
problematic, because each CPU tries to enter kdb. All CPUs are
executing NMI handlers, which set the latches disabling the nested NMI
delivery; this means that stop_cpus_hard(), used by kdb_enter() to
stop other cpus by broadcasting IPI_STOP_HARD NMI, cannot work. One
indication of this is the harmless but annoying diagnostic "timeout
stopping cpus".

Much more harming behaviour is that because all CPUs try to enter kdb,
and if ddb is used as debugger, all CPUs issue prompt on console and
race for the input, not to mention the simultaneous use of the ddb
shared state.

Try to fix this by introducing a pseudo-lock for simultaneous attempts
to handle NMIs. If one core happens to enter NMI trap handler, other
cores see it and simulate reception of the IPI_STOP_HARD. More,
generic_stop_cpus() avoids sending IPI_STOP_HARD and avoids waiting
for the acknowledgement, relying on the nmi handler on other cores
suspending and then restarting the CPU.

Since it is impossible to detect at runtime whether some stray NMI is
broadcast or unicast, add a knob for administrator (really developer)
to configure debugging NMI handling mode.

The updated patch was debugged with the help from Andrey Gapon (avg)
and discussed with him.

Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D8249


# 38605d73 15-Sep-2016 John Baldwin <jhb@FreeBSD.org>

Remove 'cpu' and 'cpu_class' on amd64.

The 'cpu' and 'cpu_class' variables were always set to the same value
on amd64 and are legacy holdovers from i386. Remove them entirely on
amd64.

Reviewed by: imp, kib (older version)
Differential Revision: https://reviews.freebsd.org/D7888


# 0d63fc3e 12-Apr-2016 Andriy Gapon <avg@FreeBSD.org>

re-enable AMD Topology extension on certain models if disabled by BIOS

Some BIOSes disable AMD Topology extension on AMD Family 15h notebook
processors. We re-enable the extension, so that we can properly discover
core and cache topology. Linux seems to do the same.

Reported by: Johannes Dieterich <dieterich.joh@gmail.com>
Reviewed by: jhb, kib
Tested by: Johannes Dieterich <dieterich.joh@gmail.com>
(earlier version)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D5883


# 0df87548 29-Mar-2016 Konstantin Belousov <kib@FreeBSD.org>

Type of the interrupt handlers on x86 cannot be expressed in C.
Simplify and unify placeholder type definitions.

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5771


# 7c958a41 07-Dec-2015 Konstantin Belousov <kib@FreeBSD.org>

Merge common parts of i386 and amd64 md_var.h and smp.h into
new headers x86/include x86_var.h and x86_smp.h.

Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D4358