History log of /openbsd-current/sys/arch/amd64/amd64/identcpu.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.144 16-Jun-2024 kn

Make GENERIC boot on ZHAOXIN KaiXian KX-6640MA

The Unchartevice 6640MA notebook comes with such a CentaurHauls CPU,
installs via RAMDISK_CD (with AHCI fix), but GENERIC would hang after
cpu0: 4MB 64b/line 16-way L2 cache

Pretty sure Intel TPM sensor code should run on Intel CPUs, anyway.

Idea from brynet
OK deraadt brynet


# 1.143 14-May-2024 guenther

Instead of enabling use of PCLMUL and AESNI iff cpu0 supports them
via two global variables, make cpu_ecxfeature the intersection of
cpuid(1).ecx on all CPUs and switch cpu_configure() to directly
check that for the requisite flags.

ok kettenis@


# 1.142 12-May-2024 guenther

Delete the cpu_perf_e[abd]x and cpu_apmi_edx globals and move the
cpuid uses into identifycpu(), as they aren't needed anywhere else.

ok kettenis@


# 1.141 11-May-2024 guenther

Use %b to format cpu flag info in dmesg, so we have the raw values
too. This is also much more space efficient.
Reduce the cpu flag noise in dmesg by suppressing lines and registers
that are identical with the previous CPU and show -/+ info if there
are any differences.

particular feedback from deraadt@, kettenis@, jsg@, and dv@
ok deraadt@


# 1.140 03-Apr-2024 guenther

Add ci_cpuid_level and ci_vendor holding the per-CPU basic cpuid
level and a numeric mapping of the cpu vendor, both from CPUID(0).
Convert the general use of strcmp(cpu_vendor) to simple numeric
tests of ci_vendor. Track the minimum of all ci_cpuid_level in the
cpuid_level global and continue to use that for what we vmm exposes.

AMD testing help matthieu@ krw@
ok miod@ deraadt@ cheloha@


# 1.139 17-Mar-2024 guenther

Use VERW to mitigate the RFDS (Register File Data Sampling) vulnerability
present in Intel Atom CPUs, reordering some ASM in return-to-userspace and
start/resume-vmx-guest to reduce the number of kernel values still live in
registers when VERW is used. This mitigation requires updated firmware which
has affected CPUs report RFDS_CLEAR in dmesg.

Firmware packaging by jsg@ and sthen@
Logic for interpreting intel's flags by jsg@ after lots of discussion
between him, deraadt@, and I
ok deraadt@


Revision tags: OPENBSD_7_4_BASE OPENBSD_7_5_BASE
# 1.138 03-Sep-2023 mlarkin

vmm(4): Suppress AMD HwPstate visibility to guests

On newer Ryzen/EPYC, we need to hide the HwPstate CPUID 80000007:EDX
field for HwPstate, or guests will try to access the MSRs associated
with those, and that will fail with #GP.

ok deraadt


# 1.137 16-Aug-2023 jsg

add Intel ARCH_CAP_GDS bits

mentioned in
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html


# 1.136 09-Aug-2023 jsg

show x86 cpu patch level in dmesg
ok guenther@ deraadt@


# 1.135 27-Jul-2023 guenther

Report speculation control bits in dmesg cpu lines.

ok mlarkin@


# 1.134 21-Jul-2023 guenther

Rename ARCH_CAPABILITIES_* #defined to ARCH_CAP_*
Provide more ARCH_CAP_* defines per June 2023 SDM

ok jsg@ deraadt@


# 1.133 22-Apr-2023 guenther

Rename the XCR0_* #defines to XFEATURE_* and add the new supervisor-state
features: while all are appropriate for xsaves/xrstors, the
supervisor-state features aren't for xcr0 but rather for the new XSS_MSR,
making the current names kinda confusing.

Add #defines for masking bits for xcr0 vs XSS.

Add and report the new XSAVE_XFD xsave subfeature bit.

ok mlarkin@


# 1.132 26-Mar-2023 mlarkin

amd64: identify IBT capability in cpu(4) dmesg lines

requested by and ok deraadt@


Revision tags: OPENBSD_7_3_BASE
# 1.131 14-Jan-2023 jsg

recognise protection keys for supervisor-mode (PKS) in cpuid
ok deraadt@


# 1.130 10-Jan-2023 dv

Hide WAITPKG cpu feature from vmm(4) guests.

Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.

This also adds WAITPKG to i386 and amd64 cpu feature identification.

Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.

OK deraadt.


Revision tags: OPENBSD_7_2_BASE
# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.143 14-May-2024 guenther

Instead of enabling use of PCLMUL and AESNI iff cpu0 supports them
via two global variables, make cpu_ecxfeature the intersection of
cpuid(1).ecx on all CPUs and switch cpu_configure() to directly
check that for the requisite flags.

ok kettenis@


# 1.142 12-May-2024 guenther

Delete the cpu_perf_e[abd]x and cpu_apmi_edx globals and move the
cpuid uses into identifycpu(), as they aren't needed anywhere else.

ok kettenis@


# 1.141 11-May-2024 guenther

Use %b to format cpu flag info in dmesg, so we have the raw values
too. This is also much more space efficient.
Reduce the cpu flag noise in dmesg by suppressing lines and registers
that are identical with the previous CPU and show -/+ info if there
are any differences.

particular feedback from deraadt@, kettenis@, jsg@, and dv@
ok deraadt@


# 1.140 03-Apr-2024 guenther

Add ci_cpuid_level and ci_vendor holding the per-CPU basic cpuid
level and a numeric mapping of the cpu vendor, both from CPUID(0).
Convert the general use of strcmp(cpu_vendor) to simple numeric
tests of ci_vendor. Track the minimum of all ci_cpuid_level in the
cpuid_level global and continue to use that for what we vmm exposes.

AMD testing help matthieu@ krw@
ok miod@ deraadt@ cheloha@


# 1.139 17-Mar-2024 guenther

Use VERW to mitigate the RFDS (Register File Data Sampling) vulnerability
present in Intel Atom CPUs, reordering some ASM in return-to-userspace and
start/resume-vmx-guest to reduce the number of kernel values still live in
registers when VERW is used. This mitigation requires updated firmware which
has affected CPUs report RFDS_CLEAR in dmesg.

Firmware packaging by jsg@ and sthen@
Logic for interpreting intel's flags by jsg@ after lots of discussion
between him, deraadt@, and I
ok deraadt@


Revision tags: OPENBSD_7_4_BASE OPENBSD_7_5_BASE
# 1.138 03-Sep-2023 mlarkin

vmm(4): Suppress AMD HwPstate visibility to guests

On newer Ryzen/EPYC, we need to hide the HwPstate CPUID 80000007:EDX
field for HwPstate, or guests will try to access the MSRs associated
with those, and that will fail with #GP.

ok deraadt


# 1.137 16-Aug-2023 jsg

add Intel ARCH_CAP_GDS bits

mentioned in
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html


# 1.136 09-Aug-2023 jsg

show x86 cpu patch level in dmesg
ok guenther@ deraadt@


# 1.135 27-Jul-2023 guenther

Report speculation control bits in dmesg cpu lines.

ok mlarkin@


# 1.134 21-Jul-2023 guenther

Rename ARCH_CAPABILITIES_* #defined to ARCH_CAP_*
Provide more ARCH_CAP_* defines per June 2023 SDM

ok jsg@ deraadt@


# 1.133 22-Apr-2023 guenther

Rename the XCR0_* #defines to XFEATURE_* and add the new supervisor-state
features: while all are appropriate for xsaves/xrstors, the
supervisor-state features aren't for xcr0 but rather for the new XSS_MSR,
making the current names kinda confusing.

Add #defines for masking bits for xcr0 vs XSS.

Add and report the new XSAVE_XFD xsave subfeature bit.

ok mlarkin@


# 1.132 26-Mar-2023 mlarkin

amd64: identify IBT capability in cpu(4) dmesg lines

requested by and ok deraadt@


Revision tags: OPENBSD_7_3_BASE
# 1.131 14-Jan-2023 jsg

recognise protection keys for supervisor-mode (PKS) in cpuid
ok deraadt@


# 1.130 10-Jan-2023 dv

Hide WAITPKG cpu feature from vmm(4) guests.

Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.

This also adds WAITPKG to i386 and amd64 cpu feature identification.

Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.

OK deraadt.


Revision tags: OPENBSD_7_2_BASE
# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.143 14-May-2024 guenther

Instead of enabling use of PCLMUL and AESNI iff cpu0 supports them
via two global variables, make cpu_ecxfeature the intersection of
cpuid(1).ecx on all CPUs and switch cpu_configure() to directly
check that for the requisite flags.

ok kettenis@


# 1.142 12-May-2024 guenther

Delete the cpu_perf_e[abd]x and cpu_apmi_edx globals and move the
cpuid uses into identifycpu(), as they aren't needed anywhere else.

ok kettenis@


# 1.141 11-May-2024 guenther

Use %b to format cpu flag info in dmesg, so we have the raw values
too. This is also much more space efficient.
Reduce the cpu flag noise in dmesg by suppressing lines and registers
that are identical with the previous CPU and show -/+ info if there
are any differences.

particular feedback from deraadt@, kettenis@, jsg@, and dv@
ok deraadt@


# 1.140 03-Apr-2024 guenther

Add ci_cpuid_level and ci_vendor holding the per-CPU basic cpuid
level and a numeric mapping of the cpu vendor, both from CPUID(0).
Convert the general use of strcmp(cpu_vendor) to simple numeric
tests of ci_vendor. Track the minimum of all ci_cpuid_level in the
cpuid_level global and continue to use that for what we vmm exposes.

AMD testing help matthieu@ krw@
ok miod@ deraadt@ cheloha@


# 1.139 17-Mar-2024 guenther

Use VERW to mitigate the RFDS (Register File Data Sampling) vulnerability
present in Intel Atom CPUs, reordering some ASM in return-to-userspace and
start/resume-vmx-guest to reduce the number of kernel values still live in
registers when VERW is used. This mitigation requires updated firmware which
has affected CPUs report RFDS_CLEAR in dmesg.

Firmware packaging by jsg@ and sthen@
Logic for interpreting intel's flags by jsg@ after lots of discussion
between him, deraadt@, and I
ok deraadt@


Revision tags: OPENBSD_7_4_BASE OPENBSD_7_5_BASE
# 1.138 03-Sep-2023 mlarkin

vmm(4): Suppress AMD HwPstate visibility to guests

On newer Ryzen/EPYC, we need to hide the HwPstate CPUID 80000007:EDX
field for HwPstate, or guests will try to access the MSRs associated
with those, and that will fail with #GP.

ok deraadt


# 1.137 16-Aug-2023 jsg

add Intel ARCH_CAP_GDS bits

mentioned in
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html


# 1.136 09-Aug-2023 jsg

show x86 cpu patch level in dmesg
ok guenther@ deraadt@


# 1.135 27-Jul-2023 guenther

Report speculation control bits in dmesg cpu lines.

ok mlarkin@


# 1.134 21-Jul-2023 guenther

Rename ARCH_CAPABILITIES_* #defined to ARCH_CAP_*
Provide more ARCH_CAP_* defines per June 2023 SDM

ok jsg@ deraadt@


# 1.133 22-Apr-2023 guenther

Rename the XCR0_* #defines to XFEATURE_* and add the new supervisor-state
features: while all are appropriate for xsaves/xrstors, the
supervisor-state features aren't for xcr0 but rather for the new XSS_MSR,
making the current names kinda confusing.

Add #defines for masking bits for xcr0 vs XSS.

Add and report the new XSAVE_XFD xsave subfeature bit.

ok mlarkin@


# 1.132 26-Mar-2023 mlarkin

amd64: identify IBT capability in cpu(4) dmesg lines

requested by and ok deraadt@


Revision tags: OPENBSD_7_3_BASE
# 1.131 14-Jan-2023 jsg

recognise protection keys for supervisor-mode (PKS) in cpuid
ok deraadt@


# 1.130 10-Jan-2023 dv

Hide WAITPKG cpu feature from vmm(4) guests.

Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.

This also adds WAITPKG to i386 and amd64 cpu feature identification.

Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.

OK deraadt.


Revision tags: OPENBSD_7_2_BASE
# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.140 03-Apr-2024 guenther

Add ci_cpuid_level and ci_vendor holding the per-CPU basic cpuid
level and a numeric mapping of the cpu vendor, both from CPUID(0).
Convert the general use of strcmp(cpu_vendor) to simple numeric
tests of ci_vendor. Track the minimum of all ci_cpuid_level in the
cpuid_level global and continue to use that for what we vmm exposes.

AMD testing help matthieu@ krw@
ok miod@ deraadt@ cheloha@


# 1.139 17-Mar-2024 guenther

Use VERW to mitigate the RFDS (Register File Data Sampling) vulnerability
present in Intel Atom CPUs, reordering some ASM in return-to-userspace and
start/resume-vmx-guest to reduce the number of kernel values still live in
registers when VERW is used. This mitigation requires updated firmware which
has affected CPUs report RFDS_CLEAR in dmesg.

Firmware packaging by jsg@ and sthen@
Logic for interpreting intel's flags by jsg@ after lots of discussion
between him, deraadt@, and I
ok deraadt@


Revision tags: OPENBSD_7_4_BASE OPENBSD_7_5_BASE
# 1.138 03-Sep-2023 mlarkin

vmm(4): Suppress AMD HwPstate visibility to guests

On newer Ryzen/EPYC, we need to hide the HwPstate CPUID 80000007:EDX
field for HwPstate, or guests will try to access the MSRs associated
with those, and that will fail with #GP.

ok deraadt


# 1.137 16-Aug-2023 jsg

add Intel ARCH_CAP_GDS bits

mentioned in
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html


# 1.136 09-Aug-2023 jsg

show x86 cpu patch level in dmesg
ok guenther@ deraadt@


# 1.135 27-Jul-2023 guenther

Report speculation control bits in dmesg cpu lines.

ok mlarkin@


# 1.134 21-Jul-2023 guenther

Rename ARCH_CAPABILITIES_* #defined to ARCH_CAP_*
Provide more ARCH_CAP_* defines per June 2023 SDM

ok jsg@ deraadt@


# 1.133 22-Apr-2023 guenther

Rename the XCR0_* #defines to XFEATURE_* and add the new supervisor-state
features: while all are appropriate for xsaves/xrstors, the
supervisor-state features aren't for xcr0 but rather for the new XSS_MSR,
making the current names kinda confusing.

Add #defines for masking bits for xcr0 vs XSS.

Add and report the new XSAVE_XFD xsave subfeature bit.

ok mlarkin@


# 1.132 26-Mar-2023 mlarkin

amd64: identify IBT capability in cpu(4) dmesg lines

requested by and ok deraadt@


Revision tags: OPENBSD_7_3_BASE
# 1.131 14-Jan-2023 jsg

recognise protection keys for supervisor-mode (PKS) in cpuid
ok deraadt@


# 1.130 10-Jan-2023 dv

Hide WAITPKG cpu feature from vmm(4) guests.

Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.

This also adds WAITPKG to i386 and amd64 cpu feature identification.

Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.

OK deraadt.


Revision tags: OPENBSD_7_2_BASE
# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.139 17-Mar-2024 guenther

Use VERW to mitigate the RFDS (Register File Data Sampling) vulnerability
present in Intel Atom CPUs, reordering some ASM in return-to-userspace and
start/resume-vmx-guest to reduce the number of kernel values still live in
registers when VERW is used. This mitigation requires updated firmware which
has affected CPUs report RFDS_CLEAR in dmesg.

Firmware packaging by jsg@ and sthen@
Logic for interpreting intel's flags by jsg@ after lots of discussion
between him, deraadt@, and I
ok deraadt@


Revision tags: OPENBSD_7_4_BASE OPENBSD_7_5_BASE
# 1.138 03-Sep-2023 mlarkin

vmm(4): Suppress AMD HwPstate visibility to guests

On newer Ryzen/EPYC, we need to hide the HwPstate CPUID 80000007:EDX
field for HwPstate, or guests will try to access the MSRs associated
with those, and that will fail with #GP.

ok deraadt


# 1.137 16-Aug-2023 jsg

add Intel ARCH_CAP_GDS bits

mentioned in
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html


# 1.136 09-Aug-2023 jsg

show x86 cpu patch level in dmesg
ok guenther@ deraadt@


# 1.135 27-Jul-2023 guenther

Report speculation control bits in dmesg cpu lines.

ok mlarkin@


# 1.134 21-Jul-2023 guenther

Rename ARCH_CAPABILITIES_* #defined to ARCH_CAP_*
Provide more ARCH_CAP_* defines per June 2023 SDM

ok jsg@ deraadt@


# 1.133 22-Apr-2023 guenther

Rename the XCR0_* #defines to XFEATURE_* and add the new supervisor-state
features: while all are appropriate for xsaves/xrstors, the
supervisor-state features aren't for xcr0 but rather for the new XSS_MSR,
making the current names kinda confusing.

Add #defines for masking bits for xcr0 vs XSS.

Add and report the new XSAVE_XFD xsave subfeature bit.

ok mlarkin@


# 1.132 26-Mar-2023 mlarkin

amd64: identify IBT capability in cpu(4) dmesg lines

requested by and ok deraadt@


Revision tags: OPENBSD_7_3_BASE
# 1.131 14-Jan-2023 jsg

recognise protection keys for supervisor-mode (PKS) in cpuid
ok deraadt@


# 1.130 10-Jan-2023 dv

Hide WAITPKG cpu feature from vmm(4) guests.

Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.

This also adds WAITPKG to i386 and amd64 cpu feature identification.

Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.

OK deraadt.


Revision tags: OPENBSD_7_2_BASE
# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.138 03-Sep-2023 mlarkin

vmm(4): Suppress AMD HwPstate visibility to guests

On newer Ryzen/EPYC, we need to hide the HwPstate CPUID 80000007:EDX
field for HwPstate, or guests will try to access the MSRs associated
with those, and that will fail with #GP.

ok deraadt


# 1.137 16-Aug-2023 jsg

add Intel ARCH_CAP_GDS bits

mentioned in
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html


# 1.136 09-Aug-2023 jsg

show x86 cpu patch level in dmesg
ok guenther@ deraadt@


# 1.135 27-Jul-2023 guenther

Report speculation control bits in dmesg cpu lines.

ok mlarkin@


# 1.134 21-Jul-2023 guenther

Rename ARCH_CAPABILITIES_* #defined to ARCH_CAP_*
Provide more ARCH_CAP_* defines per June 2023 SDM

ok jsg@ deraadt@


# 1.133 22-Apr-2023 guenther

Rename the XCR0_* #defines to XFEATURE_* and add the new supervisor-state
features: while all are appropriate for xsaves/xrstors, the
supervisor-state features aren't for xcr0 but rather for the new XSS_MSR,
making the current names kinda confusing.

Add #defines for masking bits for xcr0 vs XSS.

Add and report the new XSAVE_XFD xsave subfeature bit.

ok mlarkin@


# 1.132 26-Mar-2023 mlarkin

amd64: identify IBT capability in cpu(4) dmesg lines

requested by and ok deraadt@


Revision tags: OPENBSD_7_3_BASE
# 1.131 14-Jan-2023 jsg

recognise protection keys for supervisor-mode (PKS) in cpuid
ok deraadt@


# 1.130 10-Jan-2023 dv

Hide WAITPKG cpu feature from vmm(4) guests.

Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.

This also adds WAITPKG to i386 and amd64 cpu feature identification.

Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.

OK deraadt.


Revision tags: OPENBSD_7_2_BASE
# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.137 16-Aug-2023 jsg

add Intel ARCH_CAP_GDS bits

mentioned in
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html


# 1.136 09-Aug-2023 jsg

show x86 cpu patch level in dmesg
ok guenther@ deraadt@


# 1.135 27-Jul-2023 guenther

Report speculation control bits in dmesg cpu lines.

ok mlarkin@


# 1.134 21-Jul-2023 guenther

Rename ARCH_CAPABILITIES_* #defined to ARCH_CAP_*
Provide more ARCH_CAP_* defines per June 2023 SDM

ok jsg@ deraadt@


# 1.133 22-Apr-2023 guenther

Rename the XCR0_* #defines to XFEATURE_* and add the new supervisor-state
features: while all are appropriate for xsaves/xrstors, the
supervisor-state features aren't for xcr0 but rather for the new XSS_MSR,
making the current names kinda confusing.

Add #defines for masking bits for xcr0 vs XSS.

Add and report the new XSAVE_XFD xsave subfeature bit.

ok mlarkin@


# 1.132 26-Mar-2023 mlarkin

amd64: identify IBT capability in cpu(4) dmesg lines

requested by and ok deraadt@


Revision tags: OPENBSD_7_3_BASE
# 1.131 14-Jan-2023 jsg

recognise protection keys for supervisor-mode (PKS) in cpuid
ok deraadt@


# 1.130 10-Jan-2023 dv

Hide WAITPKG cpu feature from vmm(4) guests.

Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.

This also adds WAITPKG to i386 and amd64 cpu feature identification.

Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.

OK deraadt.


Revision tags: OPENBSD_7_2_BASE
# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.136 09-Aug-2023 jsg

show x86 cpu patch level in dmesg
ok guenther@ deraadt@


# 1.135 27-Jul-2023 guenther

Report speculation control bits in dmesg cpu lines.

ok mlarkin@


# 1.134 21-Jul-2023 guenther

Rename ARCH_CAPABILITIES_* #defined to ARCH_CAP_*
Provide more ARCH_CAP_* defines per June 2023 SDM

ok jsg@ deraadt@


# 1.133 22-Apr-2023 guenther

Rename the XCR0_* #defines to XFEATURE_* and add the new supervisor-state
features: while all are appropriate for xsaves/xrstors, the
supervisor-state features aren't for xcr0 but rather for the new XSS_MSR,
making the current names kinda confusing.

Add #defines for masking bits for xcr0 vs XSS.

Add and report the new XSAVE_XFD xsave subfeature bit.

ok mlarkin@


# 1.132 26-Mar-2023 mlarkin

amd64: identify IBT capability in cpu(4) dmesg lines

requested by and ok deraadt@


Revision tags: OPENBSD_7_3_BASE
# 1.131 14-Jan-2023 jsg

recognise protection keys for supervisor-mode (PKS) in cpuid
ok deraadt@


# 1.130 10-Jan-2023 dv

Hide WAITPKG cpu feature from vmm(4) guests.

Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.

This also adds WAITPKG to i386 and amd64 cpu feature identification.

Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.

OK deraadt.


Revision tags: OPENBSD_7_2_BASE
# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.135 27-Jul-2023 guenther

Report speculation control bits in dmesg cpu lines.

ok mlarkin@


# 1.134 21-Jul-2023 guenther

Rename ARCH_CAPABILITIES_* #defined to ARCH_CAP_*
Provide more ARCH_CAP_* defines per June 2023 SDM

ok jsg@ deraadt@


# 1.133 22-Apr-2023 guenther

Rename the XCR0_* #defines to XFEATURE_* and add the new supervisor-state
features: while all are appropriate for xsaves/xrstors, the
supervisor-state features aren't for xcr0 but rather for the new XSS_MSR,
making the current names kinda confusing.

Add #defines for masking bits for xcr0 vs XSS.

Add and report the new XSAVE_XFD xsave subfeature bit.

ok mlarkin@


# 1.132 26-Mar-2023 mlarkin

amd64: identify IBT capability in cpu(4) dmesg lines

requested by and ok deraadt@


Revision tags: OPENBSD_7_3_BASE
# 1.131 14-Jan-2023 jsg

recognise protection keys for supervisor-mode (PKS) in cpuid
ok deraadt@


# 1.130 10-Jan-2023 dv

Hide WAITPKG cpu feature from vmm(4) guests.

Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.

This also adds WAITPKG to i386 and amd64 cpu feature identification.

Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.

OK deraadt.


Revision tags: OPENBSD_7_2_BASE
# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.134 21-Jul-2023 guenther

Rename ARCH_CAPABILITIES_* #defined to ARCH_CAP_*
Provide more ARCH_CAP_* defines per June 2023 SDM

ok jsg@ deraadt@


# 1.133 22-Apr-2023 guenther

Rename the XCR0_* #defines to XFEATURE_* and add the new supervisor-state
features: while all are appropriate for xsaves/xrstors, the
supervisor-state features aren't for xcr0 but rather for the new XSS_MSR,
making the current names kinda confusing.

Add #defines for masking bits for xcr0 vs XSS.

Add and report the new XSAVE_XFD xsave subfeature bit.

ok mlarkin@


# 1.132 26-Mar-2023 mlarkin

amd64: identify IBT capability in cpu(4) dmesg lines

requested by and ok deraadt@


Revision tags: OPENBSD_7_3_BASE
# 1.131 14-Jan-2023 jsg

recognise protection keys for supervisor-mode (PKS) in cpuid
ok deraadt@


# 1.130 10-Jan-2023 dv

Hide WAITPKG cpu feature from vmm(4) guests.

Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.

This also adds WAITPKG to i386 and amd64 cpu feature identification.

Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.

OK deraadt.


Revision tags: OPENBSD_7_2_BASE
# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.133 22-Apr-2023 guenther

Rename the XCR0_* #defines to XFEATURE_* and add the new supervisor-state
features: while all are appropriate for xsaves/xrstors, the
supervisor-state features aren't for xcr0 but rather for the new XSS_MSR,
making the current names kinda confusing.

Add #defines for masking bits for xcr0 vs XSS.

Add and report the new XSAVE_XFD xsave subfeature bit.

ok mlarkin@


# 1.132 26-Mar-2023 mlarkin

amd64: identify IBT capability in cpu(4) dmesg lines

requested by and ok deraadt@


Revision tags: OPENBSD_7_3_BASE
# 1.131 14-Jan-2023 jsg

recognise protection keys for supervisor-mode (PKS) in cpuid
ok deraadt@


# 1.130 10-Jan-2023 dv

Hide WAITPKG cpu feature from vmm(4) guests.

Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.

This also adds WAITPKG to i386 and amd64 cpu feature identification.

Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.

OK deraadt.


Revision tags: OPENBSD_7_2_BASE
# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.132 26-Mar-2023 mlarkin

amd64: identify IBT capability in cpu(4) dmesg lines

requested by and ok deraadt@


Revision tags: OPENBSD_7_3_BASE
# 1.131 14-Jan-2023 jsg

recognise protection keys for supervisor-mode (PKS) in cpuid
ok deraadt@


# 1.130 10-Jan-2023 dv

Hide WAITPKG cpu feature from vmm(4) guests.

Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.

This also adds WAITPKG to i386 and amd64 cpu feature identification.

Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.

OK deraadt.


Revision tags: OPENBSD_7_2_BASE
# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.131 14-Jan-2023 jsg

recognise protection keys for supervisor-mode (PKS) in cpuid
ok deraadt@


# 1.130 10-Jan-2023 dv

Hide WAITPKG cpu feature from vmm(4) guests.

Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.

This also adds WAITPKG to i386 and amd64 cpu feature identification.

Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.

OK deraadt.


Revision tags: OPENBSD_7_2_BASE
# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.130 10-Jan-2023 dv

Hide WAITPKG cpu feature from vmm(4) guests.

Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.

This also adds WAITPKG to i386 and amd64 cpu feature identification.

Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.

OK deraadt.


Revision tags: OPENBSD_7_2_BASE
# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.129 22-Sep-2022 robert

Call amd64_errata() from cpu_fix_msrs() instead of identifycpu() so that
on resume, the errata is re-applied.
In addition make amd64_errata() print the information about the applied
errata only once for the first CPU.

input from jsg@ and deraadt@, ok deraadt@


# 1.128 20-Sep-2022 robert

Split out handling of cpu family specific MSRs from cpu_init_msrs()
to a separate function that gets called after identifycpu() so that
we have the required information to handle the correct MSRs for each
cpu.

Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and
IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new
function so that they get set again after a suspend/resume cycle as
well, which in fixes TSC sync failures.

discussed with and input from deraadt@, mlarkin@


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.127 30-Aug-2022 dv

Initial support for mmio assist for vmm(4)

Provide the basic information required for a userland assist in
emulating instructions touching mmio regions, sending as much
information as is provided by the host hardware.

No decode or assist provided at the moment by vmd(8).

ok mlarkin@


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.126 07-Aug-2022 guenther

Start to add annotations to the cpu_info members, doing I/a/o for
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and
CPUF_USERXSTATE, which really are private to the CPU, into a new
ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags
alterations via atomic_{set,clear}bits_int(), so its annotation
isn't a lie. Delete ci_info member as unused all the way from
rev 1.1

ok jsg@ mlarkin@


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.125 12-Jul-2022 jsg

remove cache parts of struct cpu_info only vmm used
suggested by and ok mlarkin@


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.124 26-Apr-2022 claudio

No need for line wrap here.


# 1.123 26-Apr-2022 claudio

On CPUs that have MPERF/APERF support use that information to install a
cpu frequency sensor for each core. This works on many "modern" Intel and
AMD cpus (probably anything that has some kind of turbo mode).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.121 02-Nov-2021 mlarkin

Remove trailing whitespace


Revision tags: OPENBSD_7_0_BASE
# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.120 31-Aug-2021 patrick

Identify the paravirtual bus earlier, as we need to make sure that we have
a working delay func ready before the first occurence of delay(). This is
necessary on Hyper-V Gen 2 VMs where we don't use the TSC.

Discussed with the hackroom
ok kettenis@


# 1.119 31-Aug-2021 kettenis

Use the TSC delay(9) backend earlier on machines where we can. Also use
the TSC for delays even if there is a skew between the TSCs of the cores
as this doesn't matter for delay(9).

Gets rid of te unreasonable clock speed reports on Intel Tiget Lake CPUs
where the i8254 behaves in weird ways.

ok patrick@, deraadt@, mlarkin@


Revision tags: OPENBSD_6_9_BASE
# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.118 31-Dec-2020 jsg

remove pv includes which were missed in rev 1.70


Revision tags: OPENBSD_6_8_BASE
# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.117 13-Sep-2020 jsg

add SRBDS cpuid bits


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.116 08-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in identifycpu() on amd64.

OK deraadt@


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.115 27-May-2020 jsg

don't limit clflush to Intel CPUs

discussed with deraadt@


Revision tags: OPENBSD_6_7_BASE
# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.114 17-Mar-2020 dlg

rework amd (not intel) smt/core/package detection.

the previous code relied on newer cpus having properly filled in
values for som e new cpuid fields, but these are definitely not
filled in properly if you're running in a certain type of virtual
machine, which meant a lot of cores were misidentified as threads.

this new code follows what most other operating systems seem to do.
they read the "initial local apic id", which is globally unique in
a system, and cut it up into the package, core, and smt values. the
line between a package and the cores/threads inside a package is
determined by the "ApicIdSize". once the package is masked off, the
remaining core/thread ids is divided up by the ThreadsPerCore value.
the latter defaults to 1, unless we're on a newer (eg, zen) chip
that provides a higher value.

this seems to work well across a variety of machines of different
vintages.

thanks to mark patruck, hrvoje popovski, and sthen@ for a lot of testing.
ok sthen@


Revision tags: OPENBSD_6_6_BASE
# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.113 14-Jun-2019 kettenis

Add TSC_ADJUST CPUID flag.

ok deraadt@, mlarkin@


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.112 28-May-2019 guenther

Correct the test for when the L1TF vulnerablity has been mitigated via
either hardware update (RDCL_NO) or our being nested in a VM which is
handling the flushing via the L1D_FLUSH MSR.

ok mlarkin@


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

branches: 1.110.2;
Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

branches: 1.109.2;
Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.111 17-May-2019 guenther

Mitigate Intel's Microarchitectural Data Sampling vulnerability.
If the CPU has the new VERW behavior than that is used, otherwise
use the proper sequence from Intel's "Deep Dive" doc is used in the
return-to-userspace and enter-VMM-guest paths. The enter-C3-idle
path is not mitigated because it's only a problem when SMT/HT is
enabled: mitigating everything when that's enabled would be a _huge_
set of changes that we see no point in doing.

Update vmm(4) to pass through the MSR bits so that guests can apply
the optimal mitigation.

VMM help and specific feedback from mlarkin@
vendor-portability help from jsg@ and kettenis@
ok kettenis@ mlarkin@ deraadt@ jsg@


Revision tags: OPENBSD_6_5_BASE
# 1.110 20-Oct-2018 kettenis

Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.110 20-Oct-2018 kettenis

Take the "package" into account when calculating the "smt" ID on modern
AMD CPUs. Avoids knocking out too many processor threads on for example
the AMD Ryzen Threadtipper 2990WX which apparently consists of 4 separate
dies with 8 cores each. Note that the "package" ID really is a "die" ID
here.

ok sthen@


Revision tags: OPENBSD_6_4_BASE
# 1.109 04-Oct-2018 guenther

Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.109 04-Oct-2018 guenther

Use PCIDs where they and the INVPCID instruction are available.
This uses one PCID for kernel threads, one for the U+K tables of
normal processes, one for the matching U-K tables (when meltdown
in effect), and one for temporary mappings when poking other
processes. Some further tweaks are envisioned but this is good
enough to provide more separation and has (finally) been stable
under ports testing.

lots of ports testing and valid complaints from naddy@ and sthen@
feedback from mlarkin@ and sf@


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.108 24-Aug-2018 jsg

print cpu family/model/stepping in dmesg
discussed with deraadt@ bluhm@ and sthen@


# 1.107 21-Aug-2018 deraadt

Perform mitigations for Intel L1TF screwup. There are three options:
(1) Future cpus which don't have the bug, (2) cpu's with microcode
containing a L1D flush operation, (3) stuffing the L1D cache with fresh
data and expiring old content. This stuffing loop is complicated and
interesting, no details on the mitigation have been released by Intel so
Mike and I studied other systems for inspiration. Replacement algorithm
for the L1D is described in the tlbleed paper. We use a 64K PA-linear
region filled with trapsleds (in case there is L1D->L1I data movement).
The TLBs covering the region are loaded first, because TLB loading
apparently flows through the D cache. Before performing vmlaunch or
vmresume, the cachelines covering the guest registers are also flushed.
with mlarkin, additional testing by pd, handy comments from the
kettenis and guenther peanuts


# 1.106 15-Aug-2018 jsg

add cpuid and msr bits from
'Deep Dive: CPUID Enumeration and Architectural MSRs'
ok deraadt@


# 1.105 08-Aug-2018 jsg

Recognise 'Speculative Store Bypass Disable' support cpuid bit.
Documented in 'Speculative Execution Side Channel Mitigations'
revision 2.0.


# 1.104 01-Aug-2018 brynet

On AMD CPUs, If the LFENCE serialization MSR bit is already set, then
we don't need to uncondtionally set it.

Worksaround a suspected bug in newer Linux KVM, which may trigger a
#GP fault on writes to this MSR.

ok mlarkin@


# 1.103 23-Jul-2018 brynet

Add "Mitigation G-2" per AMD's Whitepaper "Software Techniques for
Managing Speculation on AMD Processors"

By setting MSR C001_1029[1]=1, LFENCE becomes a dispatch serializing
instruction.

Tested on AMD FX-4100 "Bulldozer", and Linux guest in SVM vmd(8)

ok deraadt@ mlarkin@


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.102 12-Jul-2018 guenther

Reorganize the Meltdown entry and exit trampolines for syscall and
traps so that the "mov %rax,%cr3" is followed by an infinite loop
which is avoided because the mapping of the code being executed is
changed. This means the sysretq/iretq isn't even present in that
flow of instructions in the kernel mapping, so userspace code can't
be speculatively reached on the kernel mapping and totally eliminates
the conditional jump over the the %cr3 change that supported CPUs
without the Meltdown vulnerability. The return paths were probably
vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
executing user code post-system-call with the kernel mappings, thus
creating cache/TLB/etc side-effects.

Would like to apply this technique to the interrupt stubs too, but
I'm hitting a bug in clang's assembler which misaligns the code and
symbols.

While here, when on a CPU not vulnerable to Meltdown, codepatch out
the unnecessary bits in cpu_switchto().

Inspiration from sf@, refined over dinner with theo
ok mlarkin@ deraadt@


# 1.101 11-Jul-2018 guenther

Declare cpu_meltdown in <machine/cpu.h>


# 1.100 03-Jul-2018 jsg

add amd speculation control cpuid bits

documented in 'AMD64 Technology Indirect Branch Control Extension'
and 'Speculative Store Bypass Disable'

ok mlarkin@ deraadt@


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.99 28-Jun-2018 sthen

remove other chunk of accidentally committed test code, spotted by deraadt


# 1.98 28-Jun-2018 sthen

remove accidentally committed test code, spotted by deraadt


# 1.97 20-Jun-2018 sthen

On newer AMD parts, use CoreId (EBX) and NodeId (ECX) from cpuid 0x8000001e
to detect smt cores. As there's no "smt id" on these like there is on Intel
parts, check against other already-id'd cpus to detect which are additional
smt threads on a core.

jmatthew noticed some unusual (non-contiguous) numbering on an single
socket EPYC 7551p but there's no indication that the actual ID numbers
need to be sequential.

"As long as we treat ci_core_id as just a number, that shouldn't be an
issue" and OK kettenis@

ref: 54945 rev 1.14 - PPR for AMD Family 17h Models 00h-0Fh


# 1.96 07-Jun-2018 guenther

Treat XSAVEOPT and other XSAVE extensions like other cpu flags

oddness noted by kettenis
ok mlarkin@ deraadt@


Revision tags: OPENBSD_6_3_BASE
# 1.95 21-Feb-2018 guenther

branches: 1.95.2;
Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

branches: 1.87.2;
SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

branches: 1.82.4;
add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.95 21-Feb-2018 guenther

Meltdown: implement user/kernel page table separation.

On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)

When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.

mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.

ok mlarkin@ deraadt@


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.94 10-Feb-2018 jsg

Additional AMD CPUID bits documented in
"Processor Programming Reference (PPR) for AMD Family 17h
Model 01h, Revision B1 Processors"

ok mlarkin@ deraadt@


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)


# 1.93 15-Jan-2018 mlarkin

Add some AVX512 CPUID flags.

discussed with sf and kettenis


# 1.92 12-Jan-2018 mlarkin

IBRS -> IBRS,IBPB in identifycpu lines


# 1.91 07-Jan-2018 mlarkin

Add identcpu.c and specialreg.h definitions for the new Intel/AMD MSRs
that should help mitigate spectre. This is just the detection piece, these
features are not yet used.

Part of a larger ongoing effort to mitigate meltdown/spectre. i386 will
come later; it needs some machdep.c cleanup first.

ok kettenis@


# 1.90 18-Oct-2017 mikeb

Set TSC timecounter frequency to the CPU frequency estimate if unknown

ok mlarkin


# 1.89 14-Oct-2017 jsg

reduce the amount of includes in arch/amd64
ok mpi@ deraadt@


# 1.88 06-Oct-2017 mikeb

Recalibrate TSC timecounter with HPET and PM timer

If frequency of an invariant (non-stop) time stamp counter is measured
using an independent working timecounter that has a known frequency, we
can assume that the measured TSC frequency is as good as the resolution
of the timecounter that we use to perform the measurement. This lets us
switch from this high quality but expensive source to the cheaper TSC
without sacrificing precision on a wide range of modern CPUs.

From Adam Steen <adam@adamsteen.com.au> with tweaks from reyk@ and myself.

Tested by brynet@, sthen@ and others, OK mlarkin, sthen


Revision tags: OPENBSD_6_2_BASE
# 1.87 20-Jun-2017 mlarkin

SVM: better cleanbits handling. Fixes an issue on Bulldozer CPUs causing
#TF exceptions during guest VM boot

ok brynet


# 1.86 30-May-2017 deraadt

Support for SMAP is pretty small, so don't exclude it from the RAMDISKS.
ok jsg visa


# 1.85 19-May-2017 mlarkin

Respect max VPID/ASID limits. VMX VPIDs are capped at 4095, for now.


# 1.84 10-May-2017 tb

The setting of the cpu feature flags for PCLMUL and AES-NI was guarded with
!SMALL_KERNEL and CRYPTO. Move it out of !SMALL_KERNEL to make use of these
features on RAMDISK_CD. Fixes a performance regression in the installer
introduced with the new aes implementation. In particular, it halves the
time needed to extract baseXX.tgz and compXX.tgz on my T420.

tweaks & ok mikeb


# 1.83 14-Apr-2017 mlarkin

SVM: calculate max ASID value and save for later use. This will be used in
an upcoming diff to handle ASID/VPID reuse/rollover.


Revision tags: OPENBSD_6_1_BASE
# 1.82 28-Mar-2017 mlarkin

add RDTSCP flags to identcpu.c

ok guenther, deraadt


# 1.81 14-Feb-2017 reyk

Set the default TSC quality to -1000 to be less than the i8254

This makes sure that TSC is not used if we really don't want to. The
kernel bumps the quality to 2000 for constant invariants TSCs on
latest CPUs only.

OK mikeb@


# 1.80 13-Jan-2017 mikeb

Disable and lock Silicon Debug feature on modern Intel CPUs

This implements one of the countermeasures against using Direct
Connect Interface (DCI) to debug CPUs via USB3 mentioned in the
"Tapping into the core" talk at the 33c3: identify and disable
the Silicon Debug feature found in Haswell and newer CPUs.

ok mlarkin, deraadt


# 1.79 14-Dec-2016 reyk

Add the TSC timecounter and use it on Skylake machines where the HPET
is too slow and the invariant TSC more accurate.

The commit includes joint work by mikeb@ kettenis@ and me;
tested for some time by a large group of volunteers.

OK mikeb@ kettenis@


# 1.78 13-Oct-2016 martijn

Add an extra debug line when virtualization is disabled in the firmware.
This line would have saved me about an hour of hairpulling.

OK mlarkin@


# 1.77 30-Sep-2016 mlarkin

Compute CR3 target count. Needed for upcoming debugging diff.


# 1.76 27-Sep-2016 mlarkin

clarify a comment whose text became out of date with the previous commit


# 1.75 27-Sep-2016 mlarkin

read and cache VMFUNC capability during boot. for use in an upcoming diff


# 1.74 03-Sep-2016 mlarkin

add SDBG to cpuid bits and identcpu


Revision tags: OPENBSD_6_0_BASE
# 1.73 22-Jun-2016 mlarkin

Identify UMIP feature, if available.

ok millert, kettenis, deraadt


Revision tags: OPENBSD_5_9_BASE
# 1.72 03-Feb-2016 guenther

Test cpuid_level or ci->ci_pnfeatset before using a CPUID leaf; some BIOSes
can disable leaves that CPU feature flags would seem to imply. Corrects
signal delivery on systems where the AVX leaf is disabled.

report and debugging help from Marcus MERIGHI (mcmer-openbsd (at) tor.at)
ok kettenis@


# 1.71 27-Dec-2015 jsg

If available prefer the rdseed instruction over rdrand when adding entropy
to the kernel rng. If the rdseed source is empty fallback to rdrand
as suggested by naddy. rdrand output comes from a prng that is
periodically reseeded. rdseed should give us more bits of entropy.

ok naddy@ djm@ deraadt@


# 1.70 12-Dec-2015 reyk

Identify hypervisors before configuring other children of the mainbus
(bios, CPU, interrupt handlers, pvbus). This splits the pvbus attach
function into two parts: pvbus_identify() to scan the CPUID registers
for supported hypervisors and pvbus_attach() to attach the bus, print
information, and configure the children.

This will be needed for Xen and KVM, as discussed with mikeb@ and sf@
OK mlarkin@


# 1.69 07-Dec-2015 jsg

Add cpuid bits documented in the August 2015 revision of
"Intel Architecture Instruction Set Extensions Programming Reference"


# 1.68 05-Dec-2015 kettenis

AMD Family 12h and later processors keep their APIC clock running in deeper
C-states. Set the TMP_ARAT flag for these (which is Intel-specific) such
that acpicpu(4) enables the deeper C-states on these CPUs.

ok deraadt@


# 1.67 23-Nov-2015 deraadt

No longer need 'option VMM', declaring the vmm0 device is sufficient.
ok mlarkin


# 1.66 13-Nov-2015 mlarkin

vmm(4) kernel code

circulated on hackers@, no objections. Disabled by default.


# 1.65 07-Nov-2015 naddy

Allow overriding ghash_update() with an optimized MD function. Use
this on amd64 to provide a version that uses the PCLMUL instruction
on CPUs that support it but don't have AESNI. ok mikeb@


# 1.64 12-Aug-2015 mlarkin

Incorrect comparison when accessing cpuid extended function 0x80000007.

ok kettenis@, guenther@


Revision tags: OPENBSD_5_8_BASE
# 1.63 21-Jul-2015 reyk

Add pvbus(4), a pseudo-bus to attach non-PCI paravirtual devices and buses.
vmt(4) is moved from mainbus0 to pvbus0, more devices will follow.

OK sf@ deraadt@


# 1.62 28-May-2015 guenther

Save the cpuid(6) eax bits in the cpu_info and report the SENSOR and ARAT
bits from it.

ok krw@ kettenis@


# 1.61 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.60 08-Feb-2015 deraadt

Only attach cpu-based sensors on the primary cpu, for two reasons
- The sensor framework cannot fetch values on the right cpu
- sensor_task_register() calls malloc, and calling it is inapproapriate
ok guenther


# 1.59 08-Feb-2015 mlarkin

Typo "fature" -> "feature"


# 1.58 19-Jan-2015 jsg

Make use of an msr available on recent Intel processors to obtain the
maximum supported temperature, Tj(Max). As the temperature values are
relative to this value this should make the sensor values more accurate.

From Simon Mages.


# 1.57 16-Dec-2014 sf

Define and print HV cpuid flag.

This is set by many hypervisors, including kvm, vmware, hyper-v.


# 1.56 17-Oct-2014 kettenis

Also remove trailing spaces from the CPU brand string.

ok deraadt@, armani@


# 1.55 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


Revision tags: OPENBSD_5_6_BASE
# 1.54 13-Jul-2014 jasper

use nitems() instead of handrolling something identical

ok mpi@ sthen@


# 1.53 03-Jul-2014 matthew

Add identcpu detection for 1-GByte pages

ok mlarkin


Revision tags: OPENBSD_5_5_BASE
# 1.52 19-Nov-2013 guenther

format string fixes picked up with -Wformat=2

ok deraadt@


# 1.51 26-Sep-2013 jsg

Use the cpuid vendor string instead of the model string when enabling
VIA specific amd64 code. Makes the code work with Eden X2 processors
which have the same model/family as a Nano but don't claim to be one
in the model string.

from bytevolcano at Safe-mail.net


# 1.50 24-Aug-2013 mlarkin

fix use of uninitialized variables (used only in a DEBUG printf)

found by Maxime Villard


Revision tags: OPENBSD_5_4_BASE
# 1.49 30-Jul-2013 kettenis

Or in the CPUID_NXE bit from ci->ci_feature_eflags into ci->ci_feature_flags
to mimic what is done in locore.S. Otherwise we lose the CPUID_NXE bit.

ok matthew@


# 1.48 04-Jun-2013 haesbaert

Cpu topology for AMD64.

This adds information about smt id (thread), core id and package id
(socket) to amd64.

ci_smt_id, ci_core_id, ci_pkg_id should be followed by other
archictectures and core relying on them should be under
ARCH_HAVE_CPU_TOPOLOGY.

ok tedu@


# 1.47 06-May-2013 dlg

the use of modern intel performance counter msrs to measure the number of
cycles per second isnt reliable, particularly inside "virtual" machines.
cpuspeed can be calculated as 0, which causes a divide by zero later on
which is bad.

this goes to more effort to detect if the performance counters are in use
by the hypervisor, or detecting if they gave us a cpuspeed of 0 so we can
fall through to using rdtsc.

the same change as:
src/sys/arch/i386/include/specialreg.h r.45
src/sys/arch/i386/isa/clock.c 1.49

ok jsg@


# 1.46 09-Apr-2013 guenther

Add missing #ifdef CRYPTO around amd64_has_aesni

Diff from Silamael (Silamael (at) coronamundi.de)


# 1.45 21-Mar-2013 kurt

style(9)


# 1.44 21-Mar-2013 kurt

Detect on-die temp sensor for Atom E6xx on amd64. Adapted from
diff submitted by Matt Dainty. okay jsg@


Revision tags: OPENBSD_5_3_BASE
# 1.43 10-Nov-2012 mglocker

Recent x86 CPUs come with a constant time stamp counter. If this is
the case we verify if the CPU supports a specific version of the
architectural performance monitoring feature and read out the current
frequency from the fixed-function performance counter of the unhalted
core.

My initial motivation to implement this was the Soekris net6501-70
which comes with an Intel Atom E6xx 1.60GHz CPU. It has a constant
time stamp counter plus speed step support and boots on the lowest
frequency of 600MHz. This caused hw.cpuspeed and hw.setperf to
reflect the wrong values.

The diff is a cooperation work with jsg@. The fixed-function
performance counter read code comes from a former diff of him.

OK jsg@


# 1.42 31-Oct-2012 jsg

Add support for Intel's Supervisor Mode Access Prevention (SMAP) feature.
When enabled SMAP will generate page faults on the kernel attempting
to read/write user data pages unless an override flag is set.

Instructions that modify the flag are patched into copyin/copyout and
friends on boot if SMAP is enabled.

Those with access to hardware with SMAP can contact me for a test case.

joint work with deraadt@

ok miod@ deraadt@


# 1.41 09-Oct-2012 jsg

Sync "Structured Extended Feature Flags" cpuid bits with
the August 2012 revision of
"Intel Architecture Instruction Set Extensions Programming Reference".

Correct definitions of EREP and INVPCID, rename EREP to ERMS to
match Intel's docs. Add some more Haswell feature bits.


# 1.40 09-Oct-2012 jsg

Enable Supervisor Mode Execution Protection (SMEP), found in recent
Intel chips. If the kernel is tricked into running code from a user
page while in supervisor mode we'll now get a page fault and panic
instead of running it.

suggestions and ok guenther@, ok deraadt@


# 1.39 19-Sep-2012 jsg

Add support for the rdrand instruction found in recent Intel processors.
Joint work with naddy@

ok naddy@ deraadt@


# 1.38 07-Sep-2012 naddy

bump CPU feature strings to 12 chars since some names are now 8 characters
long, leaving no space for a trailing NUL; ok kettenis@


# 1.37 24-Aug-2012 guenther

Synchronize CR4 and CPUID portions of <machine/specialreg.h> for i386 and amd64
Add display of more feature bits: DTES64 PCID DEADLINE F16C RDRAND
Add display of "Structured Extended Feature Flags Parameters":
FSGSBASE SMEP EREP INVPCID

ok mikeb@


Revision tags: OPENBSD_5_2_BASE
# 1.36 22-Apr-2012 haesbaert

Test vendor against cpu_vendor instead of calling CPUID, this matches
the other uses.

ok mikeb@


# 1.35 27-Mar-2012 haesbaert

Run identifycpu() on its own cpu.
Discussed with many on hackers.

"Go ahead" kettenis@
"Get to it" deraadt@


Revision tags: OPENBSD_5_1_BASE
# 1.34 08-Jan-2012 haesbaert

Make sure we only read cpuid 0x80000001 features if pnfeatset reports it.
This is already done in i386.

ok jsg "if there is no change to the flags in your dmesg"


# 1.33 26-Dec-2011 haesbaert

Add the missing ECX cpu flags from CPUID at 0x80000001.
This is all documented at:

http://support.amd.com/us/Embedded_TechDocs/25481.pdf (page 20)
http://www.intel.com/assets/pdf/appnote/241618.pdf (page 41)

ok jsg@


Revision tags: OPENBSD_5_0_BASE
# 1.32 29-May-2011 deraadt

Use k1x cpu scaling on all families 0x10 and above (the trend is likely to
continue); makes the AMD E-350 speed adjust (from slow to way slower).
discussion with jsg.


# 1.31 23-May-2011 claudio

AMD K10/K11 pstate driver allows setperf and apm to change CPU
frequencies on newer AMD systems.
Driver written by Bryan Steele / brynet gmail.com
Put it in deraadt@


Revision tags: OPENBSD_4_9_BASE
# 1.30 07-Sep-2010 mikeb

enable aesni.

that means that all users running ipsec on amd64 with 'aes'
cpu flag will have aes encryption accelerated in cbc and ctr
modes for all three key sizes: 128, 192 and 256.

for debug purposed a number of operations performed by the
driver is visible throught the pstat(8) utility:

pstat -d u aesni_ops

note that you need to run config(8) to hook up new files.

ok kettenis thib deraadt


Revision tags: OPENBSD_4_8_BASE
# 1.29 01-Jul-2010 thib

Add things to enable aesni either ifdef'ed or commented out to ease
testing.

Note: aesni is not in a usable state yet!

OK deraadt@


# 1.28 26-Jun-2010 guenther

Don't #include <sys/user.h> into files that don't need the stuff
it defines. In some cases, this means pulling in uvm.h or pcb.h
instead, but most of the inclusions were just noise. Tested on
alpha, amd64, armish, hppa, i386, macpcc, sgi, sparc64, and vax,
mostly by krw and naddy.
ok krw@


# 1.27 21-Mar-2010 jsg

Add some additional Intel CPUID values for recent and upcoming processors.
With some additions from sthen@

ok kettenis@ sthen@


Revision tags: OPENBSD_4_7_BASE
# 1.26 09-Dec-2009 deraadt

this does not even compile


# 1.25 09-Dec-2009 oga

Detect the cache line size for the clflush instruction when we identify
the cpu.

ok kettenis@ as part of a larger diff.


# 1.24 07-Oct-2009 kevlo

add support for the temperature sensor of VIA Nano and C7-M CPUs.
some improvements suggested by jsg@

"commit" deraadt@


# 1.23 20-Sep-2009 jsg

Back out via nano temperature sensor changes.
They break ramdisks as noticed by jasper, and have not been
adequately discussed.


# 1.22 20-Sep-2009 kevlo

add support for VIA Nano cpu core temperature sensor

ok deraadt@


# 1.21 22-Jul-2009 deraadt

via nano cpus are amd64, and so we need machdep.xcrypt


Revision tags: OPENBSD_4_6_BASE
# 1.20 01-Jun-2009 gwk

New VIA nano's support amd64 and EST. Move the setperf init routine outside
of the vendor check for intel and use the EST cpu feature flag to determine
if we should call the est init routine. Tested on mattieu@'s via nano laptop.

ok deraadt@, jsg@


# 1.19 31-May-2009 matthieu

Fix RAMDISK kernels after previous. amd64_has_xcrypt needs to be
#ifdef CRYPTO. noticed by marco@


# 1.18 31-May-2009 matthieu

Add VIA crypto features support to amd64. ok deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.17 16-Feb-2009 krw

Core i7 chips don't have MSR_TEMPERATURE_TARGET register, and blow up
if attempts are made to read it. So read MSR_TEMPERATURE_TARGET only
when ci_model == 0xe.

Found when my Core i7 box blew up. FreeBSD allows a few more chips
but this allows my box to boot.

ok jsg@


# 1.16 16-Feb-2009 jsg

Store conditionally extended cpuid family/model values
in seperate variables in struct cpu_info instead
of duplicating the process of extracting it from the signature.

Discussed with several, 'just do it' weingart@, ok mikeb@


Revision tags: OPENBSD_4_4_BASE
# 1.15 13-Jun-2008 jsg

Detect if Intel's Safer Mode Extensions (SMX) are present,
See http://download.intel.com/technology/security/downloads/31516804.pdf
for more information.

ok deraadt@ 'looks ok to me' djm@


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.14 29-May-2007 tedu

theo says degrees is spelled degrees


# 1.13 29-May-2007 tedu

Some improvements for better intel cpu support.
Add EST support from i386, minus the tables
Also add in support for CPU temperature sensors, based on diff to tech
by Pierre Riteau.
ok deraadt gwk


# 1.12 06-May-2007 gwk

Add the mp setperf mechanism to AMD64, like its i386 counterpart it allows
all cpus in a system supporting frequency and voltage scaling to be scaled
by the same amount corresponding to the user (or apmd on their behalf)
performance level.

This diff also teaches amd64 about acpi_hasprocfvs (ACPI has processor
frequency and voltage scaling).

It also moves initilization of the underlying setperf mechanism such
as powernow to mainbus from the cpu indentification and initilization
code inspired by similar changes dim@ made to i386 durring h2k6. This
is necessary to implement the AMD recommended method for retreiving
p_state data from the ACPI _PSS object (a diff comming soon). It will
also simplify the potential addition of enhanced speedstep as found
on newer intel processors with EMT64 capable of running OpenBSD/amd64.

MP setperf functionality verifed by myself and Johan M:son Lindman <tybolt
AT solace DOT miun DOT se> on opteron 265 and 270 systems respectively.
General testing done by many others thanks!

ok tedu, dim


Revision tags: OPENBSD_4_1_BASE
# 1.11 17-Feb-2007 tom

Add code to check for the AMD amd64 errata, and correct them where
possible. Taken from NetBSD.

ok deraadt@


# 1.10 13-Feb-2007 jsg

Check for some CPUID flags found on newer Intel processors.
ok tom@ gwk@ krw@


Revision tags: OPENBSD_4_0_BASE
# 1.9 16-Mar-2006 dlg

remove useless powernow cruft from dmesg. we're interested in the
available speed states (which is output separately), not if the cpu can
support them even if the speedstates are not provided.

from gwk, ok deraadt@


# 1.8 08-Mar-2006 uwe

Patch from Gordon Klock to update AMD PowerNow K8 support on i386,
and to add amd64 K8 support from FreeBSD.


# 1.7 07-Mar-2006 jsg

It does not make sense to check for IA64 CPUID flag here.
ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.6 20-Aug-2005 jsg

Check for and report the presense of SSE3. This has started to appear
in AMD products with the arrival of the venice core.
ok deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE
# 1.5 25-Jun-2004 art

SMP support. Big parts from NetBSD, but with some really serious debugging
done by me, niklas and others. Especially wrt. NXE support.

Still needs some polishing, especially in dmesg messages, but we're now
building kernel faster than ever.


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.4 28-Feb-2004 deraadt

sysctl hw.cpuspeed output


# 1.3 27-Feb-2004 grange

Backport from i386 andreas' diff for removing leading and
duplicated spaces from cpu brand string.

ok deraadt@


# 1.2 09-Feb-2004 mickey

branches: 1.2.2;
repair cpu dmesg print a bit


# 1.1 28-Jan-2004 mickey

an amd64 arch support.
hacked by art@ from netbsd sources and then later debugged
by me into the shape where it can host itself.
no bootloader yet as needs redoing from the
recent advanced i386 sources (anyone? ;)