History log of /netbsd-current/sys/arch/x86/x86/cpu.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.210 22-Apr-2024 andvar

Surround full mp_cpu_start() method with NLAPIC > 0 guard.

Initialization is based on x86_ipi* functions, which are implemented only
when lapic flag is enabled.


Revision tags: thorpej-ifq-base thorpej-altq-separation-base
# 1.209 16-Jul-2023 riastradh

x86: Sprinkle extensive commentary about %fs/%gs initialization.

Plus some other side quests like the three-stage GDT metamorphosis
lifecycle.

No functional change intended.


# 1.208 03-Mar-2023 riastradh

x86: Call fpuinit_mxcsr_mask only once.

No need to call it again and again on the secondary CPUs to compute
what should be the same mxcsr mask. (If it's not, we have deeper
problems!)


# 1.207 25-Feb-2023 riastradh

x86: Assert kpreempt_disabled() in cpu_load_pmap.

No functional change intended. Just makes it easier to audit
curcpu() usage.


Revision tags: netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 netbsd-10-0-RC1 netbsd-10-base bouyer-sunxi-drm-base
# 1.206 24-Sep-2022 riastradh

x86: Support EFI runtime services.

This creates a special pmap, efi_runtime_pmap, which avoids setting
PTE_U but allows mappings to lie in what would normally be user VM --
this way we don't fall afoul of SMAP/SMEP when executing EFI runtime
services from CPL 0. SVS does not apply to the EFI runtime pmap.

The mechanism is intended to work with either physical addressing or
virtual addressing; currently the bootloader does physical addressing
but in principle it could be modified to do virtual addressing
instead, if it allocated virtual pages, assigned them in the memory
map, and issued RT->SetVirtualAddressMap.

Not sure pmap_activate_sync and pmap_deactivate_sync are correct,
need more review from an x86 wizard.

If this causes fallout, it can be disabled temporarily without
reverting anything by just making efi_runtime_init return immediately
without doing anything, or by removing options EFI_RUNTIME.

amd64-only for now pending type fixes and testing on i386.


# 1.205 20-Aug-2022 riastradh

x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.


# 1.204 14-Aug-2022 mlelstv

Split TSC calibtration into many small steps and disable interrupts
for each step. Also add debug messages.


# 1.203 01-Apr-2022 riastradh

x86, arm: Allow fpu_kern_enter/leave while cold.

Normally these are forbidden above IPL_VM, so that FPU usage doesn't
block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during
builtin module initialization at boot, all interrupts are blocked
anyway so it's a moot point.

Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't
trip over an assertion about it while cold -- the assertion is meant
to detect reentrance into fpu_kern_enter/leave, which is prohibited.

Also initialize cpu0's ci_kfpu_spl.


# 1.202 07-Oct-2021 msaitoh

KNF. No functional change.


Revision tags: thorpej-i2c-spi-conf2-base
# 1.201 07-Aug-2021 thorpej

Merge thorpej-cfargs2.


Revision tags: thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base
# 1.200 24-Apr-2021 thorpej

branches: 1.200.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).


Revision tags: thorpej-cfargs-base thorpej-futex-base
# 1.199 09-Oct-2020 christos

branches: 1.199.4;
Don't do extra work finding the power of 2 for values we are not going to
use. Explain that cpu_hatch has not been called yet, so no cpu_probe either
so the cache info is 0 for AP's.


# 1.198 09-Aug-2020 christos

move lcall sniffer to x86_machdep since xen/pv has its own cpu.c


# 1.197 08-Aug-2020 christos

PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.209 16-Jul-2023 riastradh

x86: Sprinkle extensive commentary about %fs/%gs initialization.

Plus some other side quests like the three-stage GDT metamorphosis
lifecycle.

No functional change intended.


# 1.208 03-Mar-2023 riastradh

x86: Call fpuinit_mxcsr_mask only once.

No need to call it again and again on the secondary CPUs to compute
what should be the same mxcsr mask. (If it's not, we have deeper
problems!)


# 1.207 25-Feb-2023 riastradh

x86: Assert kpreempt_disabled() in cpu_load_pmap.

No functional change intended. Just makes it easier to audit
curcpu() usage.


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.206 24-Sep-2022 riastradh

x86: Support EFI runtime services.

This creates a special pmap, efi_runtime_pmap, which avoids setting
PTE_U but allows mappings to lie in what would normally be user VM --
this way we don't fall afoul of SMAP/SMEP when executing EFI runtime
services from CPL 0. SVS does not apply to the EFI runtime pmap.

The mechanism is intended to work with either physical addressing or
virtual addressing; currently the bootloader does physical addressing
but in principle it could be modified to do virtual addressing
instead, if it allocated virtual pages, assigned them in the memory
map, and issued RT->SetVirtualAddressMap.

Not sure pmap_activate_sync and pmap_deactivate_sync are correct,
need more review from an x86 wizard.

If this causes fallout, it can be disabled temporarily without
reverting anything by just making efi_runtime_init return immediately
without doing anything, or by removing options EFI_RUNTIME.

amd64-only for now pending type fixes and testing on i386.


# 1.205 20-Aug-2022 riastradh

x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.


# 1.204 14-Aug-2022 mlelstv

Split TSC calibtration into many small steps and disable interrupts
for each step. Also add debug messages.


# 1.203 01-Apr-2022 riastradh

x86, arm: Allow fpu_kern_enter/leave while cold.

Normally these are forbidden above IPL_VM, so that FPU usage doesn't
block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during
builtin module initialization at boot, all interrupts are blocked
anyway so it's a moot point.

Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't
trip over an assertion about it while cold -- the assertion is meant
to detect reentrance into fpu_kern_enter/leave, which is prohibited.

Also initialize cpu0's ci_kfpu_spl.


# 1.202 07-Oct-2021 msaitoh

KNF. No functional change.


Revision tags: thorpej-i2c-spi-conf2-base
# 1.201 07-Aug-2021 thorpej

Merge thorpej-cfargs2.


Revision tags: thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base
# 1.200 24-Apr-2021 thorpej

branches: 1.200.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).


Revision tags: thorpej-cfargs-base thorpej-futex-base
# 1.199 09-Oct-2020 christos

branches: 1.199.4;
Don't do extra work finding the power of 2 for values we are not going to
use. Explain that cpu_hatch has not been called yet, so no cpu_probe either
so the cache info is 0 for AP's.


# 1.198 09-Aug-2020 christos

move lcall sniffer to x86_machdep since xen/pv has its own cpu.c


# 1.197 08-Aug-2020 christos

PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.208 03-Mar-2023 riastradh

x86: Call fpuinit_mxcsr_mask only once.

No need to call it again and again on the secondary CPUs to compute
what should be the same mxcsr mask. (If it's not, we have deeper
problems!)


# 1.207 25-Feb-2023 riastradh

x86: Assert kpreempt_disabled() in cpu_load_pmap.

No functional change intended. Just makes it easier to audit
curcpu() usage.


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.206 24-Sep-2022 riastradh

x86: Support EFI runtime services.

This creates a special pmap, efi_runtime_pmap, which avoids setting
PTE_U but allows mappings to lie in what would normally be user VM --
this way we don't fall afoul of SMAP/SMEP when executing EFI runtime
services from CPL 0. SVS does not apply to the EFI runtime pmap.

The mechanism is intended to work with either physical addressing or
virtual addressing; currently the bootloader does physical addressing
but in principle it could be modified to do virtual addressing
instead, if it allocated virtual pages, assigned them in the memory
map, and issued RT->SetVirtualAddressMap.

Not sure pmap_activate_sync and pmap_deactivate_sync are correct,
need more review from an x86 wizard.

If this causes fallout, it can be disabled temporarily without
reverting anything by just making efi_runtime_init return immediately
without doing anything, or by removing options EFI_RUNTIME.

amd64-only for now pending type fixes and testing on i386.


# 1.205 20-Aug-2022 riastradh

x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.


# 1.204 14-Aug-2022 mlelstv

Split TSC calibtration into many small steps and disable interrupts
for each step. Also add debug messages.


# 1.203 01-Apr-2022 riastradh

x86, arm: Allow fpu_kern_enter/leave while cold.

Normally these are forbidden above IPL_VM, so that FPU usage doesn't
block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during
builtin module initialization at boot, all interrupts are blocked
anyway so it's a moot point.

Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't
trip over an assertion about it while cold -- the assertion is meant
to detect reentrance into fpu_kern_enter/leave, which is prohibited.

Also initialize cpu0's ci_kfpu_spl.


# 1.202 07-Oct-2021 msaitoh

KNF. No functional change.


Revision tags: thorpej-i2c-spi-conf2-base
# 1.201 07-Aug-2021 thorpej

Merge thorpej-cfargs2.


Revision tags: thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base
# 1.200 24-Apr-2021 thorpej

branches: 1.200.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).


Revision tags: thorpej-cfargs-base thorpej-futex-base
# 1.199 09-Oct-2020 christos

branches: 1.199.4;
Don't do extra work finding the power of 2 for values we are not going to
use. Explain that cpu_hatch has not been called yet, so no cpu_probe either
so the cache info is 0 for AP's.


# 1.198 09-Aug-2020 christos

move lcall sniffer to x86_machdep since xen/pv has its own cpu.c


# 1.197 08-Aug-2020 christos

PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.207 25-Feb-2023 riastradh

x86: Assert kpreempt_disabled() in cpu_load_pmap.

No functional change intended. Just makes it easier to audit
curcpu() usage.


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.206 24-Sep-2022 riastradh

x86: Support EFI runtime services.

This creates a special pmap, efi_runtime_pmap, which avoids setting
PTE_U but allows mappings to lie in what would normally be user VM --
this way we don't fall afoul of SMAP/SMEP when executing EFI runtime
services from CPL 0. SVS does not apply to the EFI runtime pmap.

The mechanism is intended to work with either physical addressing or
virtual addressing; currently the bootloader does physical addressing
but in principle it could be modified to do virtual addressing
instead, if it allocated virtual pages, assigned them in the memory
map, and issued RT->SetVirtualAddressMap.

Not sure pmap_activate_sync and pmap_deactivate_sync are correct,
need more review from an x86 wizard.

If this causes fallout, it can be disabled temporarily without
reverting anything by just making efi_runtime_init return immediately
without doing anything, or by removing options EFI_RUNTIME.

amd64-only for now pending type fixes and testing on i386.


# 1.205 20-Aug-2022 riastradh

x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.


# 1.204 14-Aug-2022 mlelstv

Split TSC calibtration into many small steps and disable interrupts
for each step. Also add debug messages.


# 1.203 01-Apr-2022 riastradh

x86, arm: Allow fpu_kern_enter/leave while cold.

Normally these are forbidden above IPL_VM, so that FPU usage doesn't
block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during
builtin module initialization at boot, all interrupts are blocked
anyway so it's a moot point.

Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't
trip over an assertion about it while cold -- the assertion is meant
to detect reentrance into fpu_kern_enter/leave, which is prohibited.

Also initialize cpu0's ci_kfpu_spl.


# 1.202 07-Oct-2021 msaitoh

KNF. No functional change.


Revision tags: thorpej-i2c-spi-conf2-base
# 1.201 07-Aug-2021 thorpej

Merge thorpej-cfargs2.


Revision tags: thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base
# 1.200 24-Apr-2021 thorpej

branches: 1.200.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).


Revision tags: thorpej-cfargs-base thorpej-futex-base
# 1.199 09-Oct-2020 christos

branches: 1.199.4;
Don't do extra work finding the power of 2 for values we are not going to
use. Explain that cpu_hatch has not been called yet, so no cpu_probe either
so the cache info is 0 for AP's.


# 1.198 09-Aug-2020 christos

move lcall sniffer to x86_machdep since xen/pv has its own cpu.c


# 1.197 08-Aug-2020 christos

PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.206 24-Sep-2022 riastradh

x86: Support EFI runtime services.

This creates a special pmap, efi_runtime_pmap, which avoids setting
PTE_U but allows mappings to lie in what would normally be user VM --
this way we don't fall afoul of SMAP/SMEP when executing EFI runtime
services from CPL 0. SVS does not apply to the EFI runtime pmap.

The mechanism is intended to work with either physical addressing or
virtual addressing; currently the bootloader does physical addressing
but in principle it could be modified to do virtual addressing
instead, if it allocated virtual pages, assigned them in the memory
map, and issued RT->SetVirtualAddressMap.

Not sure pmap_activate_sync and pmap_deactivate_sync are correct,
need more review from an x86 wizard.

If this causes fallout, it can be disabled temporarily without
reverting anything by just making efi_runtime_init return immediately
without doing anything, or by removing options EFI_RUNTIME.

amd64-only for now pending type fixes and testing on i386.


# 1.205 20-Aug-2022 riastradh

x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.


# 1.204 14-Aug-2022 mlelstv

Split TSC calibtration into many small steps and disable interrupts
for each step. Also add debug messages.


# 1.203 01-Apr-2022 riastradh

x86, arm: Allow fpu_kern_enter/leave while cold.

Normally these are forbidden above IPL_VM, so that FPU usage doesn't
block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during
builtin module initialization at boot, all interrupts are blocked
anyway so it's a moot point.

Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't
trip over an assertion about it while cold -- the assertion is meant
to detect reentrance into fpu_kern_enter/leave, which is prohibited.

Also initialize cpu0's ci_kfpu_spl.


# 1.202 07-Oct-2021 msaitoh

KNF. No functional change.


Revision tags: thorpej-i2c-spi-conf2-base
# 1.201 07-Aug-2021 thorpej

Merge thorpej-cfargs2.


Revision tags: thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base
# 1.200 24-Apr-2021 thorpej

branches: 1.200.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).


Revision tags: thorpej-cfargs-base thorpej-futex-base
# 1.199 09-Oct-2020 christos

branches: 1.199.4;
Don't do extra work finding the power of 2 for values we are not going to
use. Explain that cpu_hatch has not been called yet, so no cpu_probe either
so the cache info is 0 for AP's.


# 1.198 09-Aug-2020 christos

move lcall sniffer to x86_machdep since xen/pv has its own cpu.c


# 1.197 08-Aug-2020 christos

PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.205 20-Aug-2022 riastradh

x86: Split most of pmap.h into pmap_private.h or vmparam.h.

This way pmap.h only contains the MD definition of the MI pmap(9)
API, which loads of things in the kernel rely on, so changing x86
pmap internals no longer requires recompiling the entire kernel every
time.

Callers needing these internals must now use machine/pmap_private.h.
Note: This is not x86/pmap_private.h because it contains three parts:

1. CPU-specific (different for i386/amd64) definitions used by...

2. common definitions, including Xenisms like xpmap_ptetomach,
further used by...

3. more CPU-specific inlines for pmap_pte_* operations

So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h
for 2, and then defines 3. Maybe we should split that out into a new
pmap_pte.h to reduce this trouble.

No functional change intended, other than that some .c files must
include machine/pmap_private.h when previously uvm/uvm_pmap.h
polluted the namespace with pmap internals.

Note: This migrates part of i386/pmap.h into i386/vmparam.h --
specifically the parts that are needed for several constants defined
in vmparam.h:

VM_MAXUSER_ADDRESS
VM_MAX_ADDRESS
VM_MAX_KERNEL_ADDRESS
VM_MIN_KERNEL_ADDRESS

Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64
too, just to keep things parallel.


# 1.204 14-Aug-2022 mlelstv

Split TSC calibtration into many small steps and disable interrupts
for each step. Also add debug messages.


# 1.203 01-Apr-2022 riastradh

x86, arm: Allow fpu_kern_enter/leave while cold.

Normally these are forbidden above IPL_VM, so that FPU usage doesn't
block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during
builtin module initialization at boot, all interrupts are blocked
anyway so it's a moot point.

Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't
trip over an assertion about it while cold -- the assertion is meant
to detect reentrance into fpu_kern_enter/leave, which is prohibited.

Also initialize cpu0's ci_kfpu_spl.


# 1.202 07-Oct-2021 msaitoh

KNF. No functional change.


Revision tags: thorpej-i2c-spi-conf2-base
# 1.201 07-Aug-2021 thorpej

Merge thorpej-cfargs2.


Revision tags: thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base
# 1.200 24-Apr-2021 thorpej

branches: 1.200.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).


Revision tags: thorpej-cfargs-base thorpej-futex-base
# 1.199 09-Oct-2020 christos

branches: 1.199.4;
Don't do extra work finding the power of 2 for values we are not going to
use. Explain that cpu_hatch has not been called yet, so no cpu_probe either
so the cache info is 0 for AP's.


# 1.198 09-Aug-2020 christos

move lcall sniffer to x86_machdep since xen/pv has its own cpu.c


# 1.197 08-Aug-2020 christos

PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.204 14-Aug-2022 mlelstv

Split TSC calibtration into many small steps and disable interrupts
for each step. Also add debug messages.


# 1.203 01-Apr-2022 riastradh

x86, arm: Allow fpu_kern_enter/leave while cold.

Normally these are forbidden above IPL_VM, so that FPU usage doesn't
block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during
builtin module initialization at boot, all interrupts are blocked
anyway so it's a moot point.

Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't
trip over an assertion about it while cold -- the assertion is meant
to detect reentrance into fpu_kern_enter/leave, which is prohibited.

Also initialize cpu0's ci_kfpu_spl.


# 1.202 07-Oct-2021 msaitoh

KNF. No functional change.


Revision tags: thorpej-i2c-spi-conf2-base
# 1.201 07-Aug-2021 thorpej

Merge thorpej-cfargs2.


Revision tags: thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base
# 1.200 24-Apr-2021 thorpej

branches: 1.200.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).


Revision tags: thorpej-cfargs-base thorpej-futex-base
# 1.199 09-Oct-2020 christos

branches: 1.199.4;
Don't do extra work finding the power of 2 for values we are not going to
use. Explain that cpu_hatch has not been called yet, so no cpu_probe either
so the cache info is 0 for AP's.


# 1.198 09-Aug-2020 christos

move lcall sniffer to x86_machdep since xen/pv has its own cpu.c


# 1.197 08-Aug-2020 christos

PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.203 01-Apr-2022 riastradh

x86, arm: Allow fpu_kern_enter/leave while cold.

Normally these are forbidden above IPL_VM, so that FPU usage doesn't
block IPL_SCHED or IPL_HIGH interrupts. But while cold, e.g. during
builtin module initialization at boot, all interrupts are blocked
anyway so it's a moot point.

Also initialize x86 cpu_info_primary.ci_kfpu_spl to -1 so we don't
trip over an assertion about it while cold -- the assertion is meant
to detect reentrance into fpu_kern_enter/leave, which is prohibited.

Also initialize cpu0's ci_kfpu_spl.


# 1.202 07-Oct-2021 msaitoh

KNF. No functional change.


Revision tags: thorpej-i2c-spi-conf2-base
# 1.201 07-Aug-2021 thorpej

Merge thorpej-cfargs2.


Revision tags: thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base
# 1.200 24-Apr-2021 thorpej

branches: 1.200.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).


Revision tags: thorpej-cfargs-base thorpej-futex-base
# 1.199 09-Oct-2020 christos

branches: 1.199.4;
Don't do extra work finding the power of 2 for values we are not going to
use. Explain that cpu_hatch has not been called yet, so no cpu_probe either
so the cache info is 0 for AP's.


# 1.198 09-Aug-2020 christos

move lcall sniffer to x86_machdep since xen/pv has its own cpu.c


# 1.197 08-Aug-2020 christos

PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.202 07-Oct-2021 msaitoh

KNF. No functional change.


Revision tags: thorpej-i2c-spi-conf2-base
# 1.201 07-Aug-2021 thorpej

Merge thorpej-cfargs2.


Revision tags: thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base
# 1.200 24-Apr-2021 thorpej

branches: 1.200.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).


Revision tags: thorpej-cfargs-base thorpej-futex-base
# 1.199 09-Oct-2020 christos

branches: 1.199.4;
Don't do extra work finding the power of 2 for values we are not going to
use. Explain that cpu_hatch has not been called yet, so no cpu_probe either
so the cache info is 0 for AP's.


# 1.198 09-Aug-2020 christos

move lcall sniffer to x86_machdep since xen/pv has its own cpu.c


# 1.197 08-Aug-2020 christos

PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.201 07-Aug-2021 thorpej

Merge thorpej-cfargs2.


Revision tags: thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base
# 1.200 24-Apr-2021 thorpej

branches: 1.200.8;
Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).


Revision tags: thorpej-cfargs-base thorpej-futex-base
# 1.199 09-Oct-2020 christos

branches: 1.199.4;
Don't do extra work finding the power of 2 for values we are not going to
use. Explain that cpu_hatch has not been called yet, so no cpu_probe either
so the cache info is 0 for AP's.


# 1.198 09-Aug-2020 christos

move lcall sniffer to x86_machdep since xen/pv has its own cpu.c


# 1.197 08-Aug-2020 christos

PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.200 24-Apr-2021 thorpej

Merge thorpej-cfargs branch:

Simplify and make extensible the config_search() / config_found() /
config_attach() interfaces: rather than having different variants for
which arguments you want pass along, just have a single call that
takes a variadic list of tag-value arguments.

Adjust all call sites:
- Simplify wherever possible; don't pass along arguments that aren't
actually needed.
- Don't be explicit about what interface attribute is attaching if
the device only has one. (More simplification.)
- Add a config_probe() function to be used in indirect configuiration
situations, making is visibly easier to see when indirect config is
in play, and allowing for future change in semantics. (As of now,
this is just a wrapper around config_match(), but that is an
implementation detail.)

Remove unnecessary or redundant interface attributes where they're not
needed.

There are currently 5 "cfargs" defined:
- CFARG_SUBMATCH (submatch function for direct config)
- CFARG_SEARCH (search function for indirect config)
- CFARG_IATTR (interface attribte)
- CFARG_LOCATORS (locators array)
- CFARG_DEVHANDLE (devhandle_t - wraps OFW, ACPI, etc. handles)

...and a sentinel value CFARG_EOL.

Add some extra sanity checking to ensure that interface attributes
aren't ambiguous.

Use CFARG_DEVHANDLE in MI FDT, OFW, and ACPI code, and macppc and shark
ports to associate those device handles with device_t instance. This
will trickle trough to more places over time (need back-end for pre-OFW
Sun OBP; any others?).


Revision tags: thorpej-cfargs-base thorpej-futex-base
# 1.199 09-Oct-2020 christos

branches: 1.199.4;
Don't do extra work finding the power of 2 for values we are not going to
use. Explain that cpu_hatch has not been called yet, so no cpu_probe either
so the cache info is 0 for AP's.


# 1.198 09-Aug-2020 christos

move lcall sniffer to x86_machdep since xen/pv has its own cpu.c


# 1.197 08-Aug-2020 christos

PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.199 09-Oct-2020 christos

Don't do extra work finding the power of 2 for values we are not going to
use. Explain that cpu_hatch has not been called yet, so no cpu_probe either
so the cache info is 0 for AP's.


# 1.198 09-Aug-2020 christos

move lcall sniffer to x86_machdep since xen/pv has its own cpu.c


# 1.197 08-Aug-2020 christos

PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.198 09-Aug-2020 christos

move lcall sniffer to x86_machdep since xen/pv has its own cpu.c


# 1.197 08-Aug-2020 christos

PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.197 08-Aug-2020 christos

PR/55547: Dan Plassche: Fix BSD/OS binary emulation.
Centralize lcall sniffer and recognize the BSD/OS flavor.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.196 28-Jul-2020 fcambus

Use CPU_IS_PRIMARY macro in cpu_stop(), cpu_resume(), and cpu_get_tsc_freq()
on x86.

OK kamil@


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.195 14-Jul-2020 yamaguchi

Introduce per-cpu IDTs

This is realized by following modifications:
- Add IDT pages and its allocation maps for each cpu in "struct cpu_info"
- Load per-cpu IDTs at cpu_init_idt(struct cpu_info*)
- Copy the IDT entries for cpu0 to other CPUs at attach
- These are, for example, exceptions, db, system calls, etc.

And, added a kernel option named PCPU_IDT to enable the feature.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.194 15-Jun-2020 msaitoh

Serialize rdtsc using with lfence, mfence or cpuid to read TSC more precisely.

x86/x86/tsc.c rev. 1.67 reduced cache problem and got big improvement, but it
still has room. I measured the effect of lfence, mfence, cpuid and rdtscp.
The impact to TSC skew and/or drift is:

AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid.

NOTE:
- An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
- On Intel i386(not amd64), it seems the improvement is very little.
- rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.193 13-Jun-2020 ad

g/c vm_page_zero_enable


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.192 21-May-2020 ad

- Recalibrate the APIC timer using the TSC, once the TSC has in turn been
recalibrated using the HPET. This gets the clock interrupt firing more
closely to HZ.

- Undo change with recent Xen merge and go back to starting the clocks in
initclocks() on the boot CPU, and in cpu_hatch() on secondary CPUs.

- On reflection don't use HPET delay any more, it works very well but means
going over the bus. It's enough to use HPET to calibrate the TSC and
APIC.

Tested on amd64 native, xen and xen PVH.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.191 12-May-2020 msaitoh

Don't use TSC freq value from CPUID if calibration works.

- When it's the first call of cpu_get_tsc_freq() the HPET is not initialized,
so try to use CPUID to get TSC freq.
- If it's the 2nd call, don't use CPUID. Instead, print the difference
between the calibrated value and CPUID's value if the verbose mode is set.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.190 08-May-2020 ad

Fix the TSC timecounter (on the systems I have access to):

- Make the early i8254-based calculation of frequency a bit more accurate.

- Keep track of how far the HPET & TSC advance between HPET attach and
secondary CPU boot, and use to compute an accurate value before attaching
the timecounter. Initial idea from joerg@.

- When determining skew and drift between CPUs, make each measurement 1000
times and pick the lowest observed value. Increase the error threshold to
1000 clock cycles.

- Use the frequency computed on the boot CPU for secondary CPUs too.

- Remove cpu_counter_serializing().


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.189 02-May-2020 bouyer

Introduce Xen PVH support in GENERIC.
This is compiled in with
options XENPVHVM
x86 changes:
- add Xen section and xen pvh entry points to locore.S. Set vm_guest
to VM_GUEST_XENPVH in this entry point.
Most of the boot procedure (especially page table setup and switch to
paged mode) is shared with native.
- change some x86_delay() to delay_func(), which points to x86_delay() for
native/HVM, and xen_delay() for PVH

Xen changes:
- remove Xen bits from init_x86_64_ksyms() and init386_ksyms()
and move to xen_init_ksyms(), used for both PV and PVH
- set ISA no-legacy-devices property for PVH
- factor out code from Xen's cpu_bootconf() to xen_bootconf()
in xen_machdep.c
- set up a specific pvh_consinit() which starts with printk()
(which uses a simple hypercall that is available early) and switch to
xencons when we can use pmap_kenter_pa().


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.188 29-Apr-2020 ad

Back out HPET delay & TSC changes to rule them out as the cause for recent
hangs during boot etc.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.187 25-Apr-2020 bouyer

Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM
guests in GENERIC.
Xen support can be disabled at runtime with
boot -c
disable hypervisor


Revision tags: bouyer-xenpvh-base2
# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.186 23-Apr-2020 ad

- Install HPET based DELAY() before going multiuser then recalibrate the TSC.
Idea from joerg@.

- Take overhead into account when computing CPU frequency.

- Don't flush cache before computing TSC skew.


Revision tags: phil-wifi-20200421
# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.185 21-Apr-2020 msaitoh

Get TSC frequency from CPUID 0x15 and/or x16 for newer Intel processors.

- If the max CPUID leaf is >= 0x15, take TSC value from CPUID. Some processors
can take TSC/core crystal clock ratio but core crystal clock frequency
can't be taken. Intel SDM give us the values for some processors.
- It also required to change lapic_per_second to make LAPIC timer correctly.
- Add new file x86/x86/identcpu_subr.c to share common subroutines between
kernel and userland. Some code in x86/x86/identcpu.c and cpuctl/arch/i386.c
will be moved to this file in future.
- Add comment to clarify.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


Revision tags: bouyer-xenpvh-base1
# 1.184 20-Apr-2020 msaitoh

Whitespace fix. No functional change.


Revision tags: phil-wifi-20200411
# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.183 10-Apr-2020 bouyer

Revert, wrong branch


# 1.182 10-Apr-2020 bouyer

Skip cx8_spllower patch if we're running on any form of Xen PV,
we can't handle PV interrupts with a single atomic op here.
Enable x86_patch() for Xen too.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1
# 1.181 14-Jan-2020 pgoyette

branches: 1.181.4;
If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

branches: 1.179.2;
Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

branches: 1.171.2;
Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.181 14-Jan-2020 pgoyette

If "application processors" were skipped/disabled at boot time (due to
RB_MD1 being set), don't try to examine the featurebus info, since it
was never retrieved. Addresses kern/54815

XXX pullup-9


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.180 08-Jan-2020 ad

Make "mach cpu" in ddb show the IPL for each cpu.


Revision tags: ad-namecache-base
# 1.179 20-Dec-2019 ad

Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.179 20-Dec-2019 ad

Some more CPU topology stuff:

- Use cegger@'s ACPI SRAT parsing code to figure out NUMA node ID for each
CPU as it is attached.

- For scheduler experiments with SMT, flag CPUs with the lowest numbered SMT
IDs as "primaries", link back to the primaries from secondaries, and build
a circular list of CPUs in each package with identical SMT IDs.

- No need for package/core/smt/numa IDs to be anything other than a u_int.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.178 07-Dec-2019 nonaka

Get a Hyper-V virtual processor id in cpu_hatch().

Currently, it is got in config_interrupts context.
However, since it is required when attaching a device,
it is got earlier than now.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.177 27-Nov-2019 maxv

Add a small API for in-kernel FPU operations.

fpu_kern_enter();
/* do FPU stuff */
fpu_kern_leave();


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.176 23-Nov-2019 ad

cpu_need_resched():

- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.175 22-Nov-2019 ad

- On-demand zeroing pages with MOVNTI is crazy. It empties L1/L2/L3.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.

Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.


Revision tags: phil-wifi-20191119
# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.174 05-Nov-2019 maxv

Add Kernel Concurrency Sanitizer (kCSan) support. This sanitizer allows us
to detect race conditions at runtime. It is a variation of TSan that is
easy to implement and more suited to kernel internals, albeit theoretically
less precise than TSan's happens-before.

We do basically two things:

- On every KCSAN_NACCESSES (=2000) memory accesses, we create a cell
describing the access, and delay the calling CPU (10ms).

- On all memory accesses, we verify if the memory we're reading/writing
is referenced in a cell already.

The combination of the two means that, if for example cpu0 does a read that
is selected and cpu1 does a write at the same address, kCSan will fire,
because cpu1's write collides with cpu0's read cell.

The coverage of the instrumentation is the same as that of kASan. Also, the
code is organized in a way similar to kASan, so it is easy to add support
for more architectures than amd64. kCSan is compatible with KCOV.

Reviewed by Kamil.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.173 12-Oct-2019 maxv

Rewrite the FPU code on x86. This greatly simplifies the logic and removes
the dependency on IPL_HIGH. NVMM is updated accordingly. Posted on
port-amd64 a week ago.

Bump the kernel version to 9.99.16.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.172 30-Aug-2019 mrg

avoid misalignment in 32 bit kernels and "mach cpu".


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.171 29-May-2019 maxv

Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

branches: 1.158.2;
Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.171 29-May-2019 maxv

Add PCID support in SVS. This avoids TLB flushes during kernel<->user
transitions, which greatly reduces the performance penalty introduced by
SVS.

We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages
in both ASIDs.

The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether
SVS+PCID is in use.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.170 27-May-2019 maxv

Change the effect of SVS on the TLB. Keep CR4_PGE set when SVS is enabled,
but don't use PTE_G on the kernel PTEs in general.

Add PTE_G on only a few pages, that are already leaked to userland and do
not contain secrets.

This slightly improves syscall performance.


# 1.169 27-May-2019 maxv

Remove 'ci_svs_kpdirpa', unused. While here fix a few comments here and
there, reduces a future diff.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


Revision tags: isaki-audio2-base
# 1.168 09-Mar-2019 maxv

Start replacing the x86 PTE bits.


# 1.167 15-Feb-2019 nonaka

Added Microsoft Hyper-V support. It ported from OpenBSD and FreeBSD.

graphical console is not work on Gen.2 VM yet. To use the serial console,
enter "consdev com,0x3f8,115200" on efiboot.


# 1.166 14-Feb-2019 cherry

Welcome XENPVHVM mode.

It is UP only, has xbd(4) and xennet(4) as PV drivers.

The console is com0 at isa and the native portion is very
rudimentary AT architecture, so is probably suboptimal to
run without PV support.


# 1.165 14-Feb-2019 cherry

Fix NLAPIC, NISA and NIOAPIC related conditional compile errors.

This will allow us to now compile an amd64 kernel without PCI.

No functional changes.


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.164 04-Dec-2018 cherry

Hypothetically speaking, if one were to want to compile a

'no options MULTIPROCESSOR'

kernel, these files may trip up the build.

Fix them by moving around the #defines as originally intended.

No Functional Changes.


# 1.163 04-Dec-2018 cherry

Stop panic()ing on a UP system.

The reason for the panic is that the cpu_attach() doesn't run to
completion because it thinks it's run past maxcpus (which in the case
of UP), is 1.

This is because on x86 at least, mi_cpu_attach() is called *before*
configure() (and thus the cpu_match()/cpu_attach() pair). Thus ncpu
has already been incremented by the time MD cpu_attach() is called.

Fix this.


Revision tags: pgoyette-compat-1126
# 1.162 12-Nov-2018 maxv

Add a comment explaining an important rule. Just to better highlight that
this rule is actually not respected.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.161 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.160 26-Jul-2018 maxv

Remove useless/outdated comments. No functional change.


# 1.159 12-Jul-2018 maxv

Oh. Don't call svs_pdir_switch if SVS is disabled, that's not needed.

I was playing around with PMCs, and was wondering why some cache misses
were occurring in svs_pdir_switch while I had SVS disabled.


Revision tags: phil-wifi-base pgoyette-compat-0625
# 1.158 22-Jun-2018 maxv

Revert jdolecek's changes related to FXSAVE. They just didn't make any
sense and were trying to hide a real bug, which is, that there is for some
reason a wrong stack alignment that causes FXSAVE to fault in
fpuinit_mxcsr_mask. As seen in current-users@ yesterday, rdi % 16 = 8. And
as seen several months ago, as well.

The rest of the changes in XSAVE are wrong too, but I'll let him fix these
ones.


# 1.157 20-Jun-2018 jdolecek

as a stop-gap, make fpuinit_mxcsr_mask() for native independant of
XSAVE as it should be, only xen case checks the flag now; need to
investigate further why exactly the fault happens for the xen
no-xsave case

pointed out by maxv


# 1.156 19-Jun-2018 jdolecek

fix FPU initialization on Xen to allow e.g. AVX when supported by hardware;
only use XSAVE when the the CPUID OSXSAVE bit is set, as this seems to be
reliable indication

tested with Xen 4.2.6 DOM0/DOMU on Intel CPU, without and with no-xsave flag,
so should work also on those AMD CPUs, which have XSAVE disabled by default;
also tested with Xen DOM0 4.8.3

fixes PR kern/50332 by Torbjorn Granlund; sorry it took three years to address

XXX pullup netbsd-8


Revision tags: pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407
# 1.155 05-Apr-2018 maxv

Call cpu_speculation_init on i386 too. We don't have IBRS for i386, but
we do have the AMD DIS_IND method.


# 1.154 04-Apr-2018 maxv

Enable the SpectreV2 mitigation by default at boot time.


Revision tags: pgoyette-compat-0330
# 1.153 28-Mar-2018 maxv

Move the SpectreV2 mitigation code into a dedicated spectre.c file. The
content of the file is taken from the end of cpu.c, and is copied as-is.


Revision tags: pgoyette-compat-0322
# 1.152 15-Mar-2018 maxv

Remove #ifdef XEN (Xen has its own cpu.c), and add a comment.


Revision tags: pgoyette-compat-0315
# 1.151 14-Mar-2018 maxv

Spectre V2 mitigation for certain families of AMD CPUs.

A new sysctl is added, machdep.spectreV2.mitigated, that controls whether
Spectre V2 is mitigated. For now it defaults to "false".

The code is written in such a way that there can be several methods. For
now only one method is supported, on AMD Families 10h, 12h and 16h, where
an MSR is available to disable branch prediction entirely.

Compile-tested on Intel, AMD will be tested soon.


# 1.150 11-Mar-2018 maxv

Explain the TSC drift thing.


Revision tags: pgoyette-compat-base
# 1.149 22-Feb-2018 maxv

branches: 1.149.2;
Remove svs_pgg_update(). Instead of manually changing PG_G on each page,
we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.

In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).

Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.


# 1.148 22-Feb-2018 maxv

Add a dynamic detection for SVS.

The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:

jmp 1f
int3
int3
int3
... int3 ...
1:

gets hotpatched to:

movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp

These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.

In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.

The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.

While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.

When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.

A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.

More changes to come, svs_init() is not very clean.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.147 27-Jan-2018 maxv

Add SMAP support for i386.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.146 11-Jan-2018 maxv

Introduce a new svs_page_add function, which can be used to map in the user
space a VA from the kernel space.

Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).

This way, the pcpu areas of the remote CPUs are not mapped in userland.


# 1.145 11-Jan-2018 msaitoh

Changing CR4 register may change cpuid values. For example, setting
CR4_OSXSAVE sets CPUID2_OSXSAVE. The CPUID2_OSXSAVE is in ci_feat_val[1],
so update it after changing CR4.


# 1.144 07-Jan-2018 maxv

Add a new option, SVS (for Separate Virtual Space), that unmaps kernel
pages when running in userland. For now, only the PTE area is unmapped.

Sent on tech-kern@.


# 1.143 07-Jan-2018 maxv

Use uvm_km_alloc instead of kmem_zalloc.


# 1.142 05-Jan-2018 maxv

Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not
Xen.

With this option, the CPU structures that must always be present in the
CPU's page tables are moved on L4 slot 384, which means address
0xffffc00000000000.

A new pcpu_area structure is defined. It contains shared structures (IDT,
LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci).
Theoretically the LDT should be in the array, but this will be done later.

During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a
page tree that is able to map the pcpu_area structure entirely. cpu0 then
immediately maps the shared structures. Later, every CPU goes through
cpu_pcpuarea_init, which allocates physical pages and kenters the relevant
pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.

The point of this change is to make sure that the structures that must
always be present in the page tables have their own L4 slot. Until now
their L4 slot was that of pmap_kernel, and making a distinction between
what must be mapped and what does not need to be was complicated.

Even in the non-speculative-bug case this change makes some sense: there
are several x86 instructions that leak the addresses of the CPU structures,
and putting these structures inside pmap_kernel actually offered a way to
compute the address of the kernel heap - which would have made ASLR on it
plainly useless, had we implemented that.

Note that, for now, pcpuarea does not contain rsp0.

Unfortunately this change adds many #ifdefs, and makes the code harder to
understand. There is also some duplication, but that will be solved later.


Revision tags: tls-maxphys-base-20171202
# 1.141 11-Nov-2017 maxv

Recommit

http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html

but use __INITIAL_MXCSR_MASK__ on Xen until someone figures out what's
wrong with the Xen fpu.


# 1.140 11-Nov-2017 bouyer

Revert http://mail-index.netbsd.org/source-changes/2017/11/08/msg089525.html,
it breaks Xen:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/201711082340Z_anita.txt


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.139 08-Nov-2017 maxv

Call fpuinit_mxcsr_mask in cpu_init, after cr4 is initialized, but before
touching xcr0. Then use clts/stts instead of modifying cr0, and enable the
mxcsr_mask detection on Xen.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.138 17-Oct-2017 maxv

Have the cpu clear PSL_D automatically when entering the kernel via a
syscall. Then, don't clear PSL_D and PSL_AC in the syscall entry point,
they are now both cleared by the cpu (faster). However they still need to
be manually cleared in the interrupt/trap entry points.


# 1.137 17-Oct-2017 maxv

Add support for SMAP on amd64.

PSL_AC is cleared from %rflags in each kernel entry point. In the copy
sections, a copy window is opened and the kernel can touch userland
pages. This window is closed when the kernel is done, either at the end
of the copy sections or in the fault-recover functions.

This implementation is not optimized yet, due to the fact that INTRENTRY
is a macro, and we can't hotpatch macros.

Sent on tech-kern@ a month or two ago, tested on a Kabylake.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.136 28-Sep-2017 maxv

Pack the useful variables at the end of the trampoline page; eliminates
a hard-coded dependency on KERNBASE. Note that I cannot test this change
on i386 right now, but it seems fine enough.


# 1.135 17-Sep-2017 maxv

Remove TRAPLOG from i386. Nowadays there are better instrumentation tools,
in both software and hardware.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.134 27-Aug-2017 maxv

style, and move some i386-specific code into i386/


# 1.133 27-Aug-2017 maxv

Localify. By the way, we should use a different stack for NMIs.


Revision tags: nick-nhusb-base-20170825
# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.132 28-Jul-2017 riastradh

cpu_trace is no more, remove vestige of it that broke ALL kernel.


Revision tags: perseant-stdc-iso10646-base
# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

branches: 1.130.2;

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.131 10-Jun-2017 pgoyette

Further reduce the loop counter so that hatching completes before the
boot processor times us out.

Add a nice big XXX comment for why the counter is so low.

XXX Will need to pullup to NetBSD-7 branch


Revision tags: netbsd-8-base
# 1.130 31-May-2017 kre

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.130 31-May-2017 kre

And now do what 1.128 should have done, and put back the (now re-)used
variable that had earlier been deleted, when it's use was removed in
1.126, but wasn't restored in 1.127.


# 1.129 31-May-2017 kre

Revert previous. Removing unused variable declarations is only a good
idea when the variable is, in fact, unused.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.128 31-May-2017 pgoyette

Remove unused variabe (I reverted too much in previous commit!)


# 1.127 31-May-2017 pgoyette

Partially revert previous. Rather than completely removing the loop
around calls to x86_pause(), just drastically reduce the repeat count.
It's still good to have some real delay here (among other things, for
letting the TSCs drift).

As discussed on IRC


# 1.126 31-May-2017 maya

Do not pause many times between testing if the CPU can go.

This only impacts QEMU as QEMU's implementation of pause is
significantly slower than its implementation of nop.

PR kern/51623: running qemu-x86_64 with -smp 4 - the additional
CPUs don't start.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.125 23-May-2017 nonaka

x86: Add preliminary x2APIC support.

x2APIC is used only when x2APIC is enabled in BIOS/UEFI.
LAPIC ID is not supported above 256.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


Revision tags: prg-localcount2-base pgoyette-localcount-20170426
# 1.124 22-Apr-2017 nonaka

use CR8 instead of LAPIC Task Priority register on x86-64.


Revision tags: bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

branches: 1.121.2;
Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


# 1.123 11-Feb-2017 maxv

Instead of using a global array with per-cpu indexes, embed the tmp VAs
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64},
because amd64 already has a direct map that is way faster than that.

There are two major issues with the global array: maxcpus entries are
allocated while it is unlikely that common i386 machines have so many
cpus, and the base VA of these entries is not cache-line-aligned, which
mostly guarantees cache-line-thrashing each time the VAs are entered.

Now the number of tmp VAs allocated is proportionate to the number of CPUs
attached (which therefore reduces memory consumption), and the base is
properly aligned.

On my 3-core AMD, the number of DC_refills_L2 events triggered when
performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on
average divided by two with this patch.

Discussed on tech-kern a little.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


Revision tags: nick-nhusb-base-20170204
# 1.122 02-Feb-2017 maxv

Use __read_mostly on these variables, to reduce the probability of false
sharing.


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.121 16-Oct-2016 maxv

Use the generic i82489_writereg instead of lapic_tpr, for consistency.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.120 07-Jul-2016 msaitoh

branches: 1.120.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226
# 1.119 16-Dec-2015 maxv

Extend SMEP support to i386 (does not require PAE).


# 1.118 13-Dec-2015 maxv

Implement amd64 support for SMEP - Supervisor Mode Execution Protection.

Now, on CPUs that support this feature, if the kernel tries to execute
an instruction located in userland, the CPU will trigger a page fault.

Tested on amd64 (Intel Core i5).


# 1.117 13-Dec-2015 maxv

Retrieve cpuid7 (Structured Extended Features) into ci_feat_val.


Revision tags: nick-nhusb-base-20150921
# 1.116 17-Sep-2015 nat

Don't disable/re-enable interrupts if they are already disabled.
Addresses PR 48196.

This commit was improved and approved by christos@


Revision tags: nick-nhusb-base-20150606
# 1.115 18-May-2015 msaitoh

OOOOPS. Revert previous.


# 1.114 18-May-2015 msaitoh

Workaround for "lapic_set_lvt: bad pin value %d" panic on some (broken?) BIOS
system. Don't panic when a local APIC's interrput input pin number (LINTx) > 1.
Instead, print warning message and continue. The default is pin 1.
Same as Linux (and perhaps FreeBSD). Tested with Shuttle DS57U.


Revision tags: nick-nhusb-base-20150406
# 1.113 12-Jan-2015 christos

PR/49104: Jarle Greipsland: Don't touch cr4 in cpus that don't have it.
XXX: pullup-7


# 1.112 08-Dec-2014 msaitoh

Modify around cpu_identify() to not to break the dmesg of cpus with AB_VERBOSE
or AB_DEBUG.


Revision tags: nick-nhusb-base netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.111 12-May-2014 joerg

branches: 1.111.2; 1.111.4;
Match lapic conditionals from the primary CPU.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.110 25-Feb-2014 dsl

branches: 1.110.2;
Add support for saving the AVX-256 ymm registers during FPU context switches.
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.


# 1.109 19-Feb-2014 dsl

Add explicit #include <x86/fpu.h> instead of relying on pcb.h including it.


# 1.108 26-Jan-2014 dsl

Remove support for 'external' floating point units and the MS-DOS
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!


# 1.107 01-Dec-2013 christos

revert fpu/pcu changes until we figure out what's wrong; they cause random
freezes


# 1.106 15-Nov-2013 msaitoh

Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid) (not changed)

CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


# 1.105 12-Nov-2013 msaitoh

Revert previos. I accidentally committed a debug code. Sorry.


# 1.104 12-Nov-2013 msaitoh

Fix a bug in last commit. Check correct variable.


# 1.103 23-Oct-2013 drochner

Use the MI "pcu" framework for bookkeeping of npx/fpu states on x86.
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.102 12-Dec-2012 pgoyette

branches: 1.102.2;
With recent introduction of conditionals for the various MP options, we
broke the build for x86 systems that have MULTIPROCESSOR but which do not
include MPBIOS. So let's try to untangle things just a bit. Presented
on current-users (and referenced on source-changes-d) without any comment.

XXX We really should find a better method to select kernel options; #ifdef
spaghetti is rather sub-optimal.


# 1.101 08-Dec-2012 kiyohara

#ifdef - #endif-ed. NMCA, NISA, NNPX, NIOAPIC, LAPIC, MPBIOS and MULTIPROCESSOR.


Revision tags: yamt-pagecache-base6
# 1.100 02-Jul-2012 chs

branches: 1.100.2;
in cpu_boot_secondary_processors(), wait until all the other CPUs
have registered themselves in kcpuset_running before returning.
recent changes to the TLB invalidation xcall code assume that
any CPU which will receive a broadcast IPI is registered in
kcpuset_running, so ensure that is true by waiting here.


# 1.99 12-Jun-2012 yamt

cpu_load_pmap: disable interrupts. add a comment to explain why. PR/44995


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9
# 1.98 20-Apr-2012 rmind

- Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use
kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the
limitation of maximum CPUs.

- Support up to 256 CPUs on amd64 architecture by default.

Bug fixes, improvements, completion of Xen part and testing on 64-core
AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs)
by Manuel Bouyer.


Revision tags: yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2
# 1.97 17-Feb-2012 bouyer

Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.

2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.

To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.


Revision tags: jmcneill-usbmp-pre-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.96 18-Oct-2011 jruoho

branches: 1.96.2; 1.96.6; 1.96.8;
As cpu_shutdown() is a wrapper to cpu_suspend(), modify slightly to prevent
setting low frequencies for active non-bootstrap processors during shutdown.


# 1.95 17-Oct-2011 jmcneill

add a "vm" device class for cpufeaturebus


# 1.94 06-Oct-2011 mrg

remove a check against uvmexp.ncolors that is done inside uvm_page_recolor()
already anyway.


# 1.93 28-Sep-2011 jruoho

Call cpufreq_suspend(9) and cpufreq_resume(9) during suspend/resume.


Revision tags: jym-xensuspend-nbase jym-xensuspend-base
# 1.92 11-Aug-2011 cherry

Unbreak the build. (conflicting types in function declaration and definition)

Thanks riz@


# 1.91 11-Aug-2011 cherry

Hide the MD details of specific IPIs behind semantically pleasing functions. This cleans up a couple of #ifdef XEN/#endif pairs


# 1.90 29-Jul-2011 dyoung

Don't shutdown the boostrap processor (BSP) because we may have to run
BIOS methods on it. For example, ACPI requires that we execute the code
for changing sleep state on the BSP.

This may help the problem where folks' machines would hang instead of
powering off when they entered ACPI sleep state 5.

XXX If the BSP is already shut down, we should start it back up.


# 1.89 22-Jun-2011 jruoho

Add small comment.


# 1.88 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase rmind-uvmplock-base
# 1.87 26-Feb-2011 jruoho

branches: 1.87.2;
Use config_defer(9) for cpu_rescan() in cpu_attach().
Also mark few local functions as static.


# 1.86 24-Feb-2011 jruoho

Fix autoconf(9) of cpufeaturebus.


# 1.85 24-Feb-2011 jruoho

Move VIA_C7TEMP to the cpufeaturebus.


# 1.84 24-Feb-2011 jruoho

Move PowerNow! to the cpufeaturebus.


# 1.83 23-Feb-2011 jruoho

Move ENHANCED_SPEEDSTEP, or henceforth est(4), to the cpufeaturebus.


# 1.82 20-Feb-2011 jruoho

Modularize coretemp(4). Ok jmcneill@.


# 1.81 19-Feb-2011 jmcneill

modularize VIA PadLock support
- retire options VIA_PADLOCK, replace with 'padlock0 at cpu0'
- driver supports attach & detach
- support building as a module


Revision tags: uebayasi-xip-base7 bouyer-quota2-base
# 1.80 02-Feb-2011 bouyer

Some CPU have cpu counter (CPUID_TSC is there) but don't handle the
rdmsr instruction (CPUID_MSR is not there).
Introduce a cpu_counter_serializing() function to remplace rdmsr(MSR_TSC)
calls, which does a rdmsr(MSR_TSC) if available and cpu_counter() otherwise.
This makes the cpu counter useable on vortex86 CPUs.
OK ad@


Revision tags: jruoho-x86intr-base
# 1.79 11-Jan-2011 jruoho

branches: 1.79.2; 1.79.4;
Use pmf_device_register1(9) and add cpu_shutdown(), which calls cpu_suspend().


Revision tags: matt-mips64-premerge-20101231 uebayasi-xip-base6 uebayasi-xip-base5
# 1.78 06-Nov-2010 uebayasi

Machine dependent code is considered as part of UVM. Include
internal API header.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11
# 1.77 20-Aug-2010 jruoho

Revert all previous changes that were made naively believing that the
existing CPU power management implementations could peacefully coexist with
the acpicpu(4) driver. The following options can not be used with acpicpu(4):
ENHANCED_SPEEDSTEP, INTEL_ONDEMAND_CLOCKMOD, POWERNOW_K7, and POWERNOW_K8.


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.76 09-Aug-2010 jruoho

Revert the previous changes to EST. The used hack had an obvious flaw:
the acpicpu(4) driver should attach even if the existing frequency management
code fails to attach, mainly because ACPI is the only proper way to deal
with EST on new Intel system.

Use a more drastic hack to deal with this: when acpicpu(4) attachs, it tears
down any existing sysctl(8) controls and installs identical ones in place.
Upon detachment, the initialization function of the existing EST is called.


# 1.75 09-Aug-2010 jruoho

Move the sysctl function pointers used by acpicpu(4) to x86/cpu.c.
Rename these so that the same pointers may be used in other parts.


# 1.74 04-Aug-2010 jruoho

Store the MADT-derived CPU ID to <x86/cpu.h>. This is required to properly
match the ACPI processor object ID with the ID available in the APIC table.


# 1.73 24-Jul-2010 jym

Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).


# 1.72 08-Jul-2010 rmind

cpu_attach: use kmem_zalloc instead of memset.


# 1.71 06-Jul-2010 cegger

Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.


Revision tags: uebayasi-xip-base1
# 1.70 18-Apr-2010 jym

This patch fixes the NX regression issue observed on amd64 kernels, where
per-page execution right was disabled (therefore leading to the inability
of the kernel to detect fraudulent use of memory mappings marked as not
being executable).

- replace cpu_feature and ci_feature_flags variables by cpu_feature and
ci_feat_val arrays. This makes it cleaner and brings kernel code closer
to the design of cpuctl(8). A warning will be raised for each CPU that
does not expose the same features as the Boot Processor (BP).

- the blacklist of CPU features is now a macro defined in the
specialreg.h header, instead of hardcoding it inside MD initialization
code; fix comments.

- replace checks against CPUID_TSC with the cpu_hascounter() function.

- clean up the code in init_x86_64(), as cpu_feature variables are set
inside cpu_probe().

- use cpu_init_msrs() for i386. It will be eventually used later for NX
feature under i386 PAE kernels.

- remove code that checks for CPUID_NOX in amd64 mptramp.S, this is already
performed by cpu_hatch() through cpu_init_msrs().

- remove cpu_signature and feature_flags members from struct mpbios_proc
(they were never used).

This patch was tested with i386 MONOLITHIC, XEN3PAE_DOM0 and XEN3_DOM0 under
a native i386 host, and amd64 GENERIC, XEN3_DOM0 via QEMU virtual machines.

XXX Should kernel rev be bumped?

XXX A similar patch should be pulled-up for NetBSD-5, hopefully tomorrow.


Revision tags: yamt-nfs-mp-base9
# 1.69 24-Feb-2010 dyoung

branches: 1.69.2;
A pointer typedef entails trading too much flexibility to declare const
and non-const types, and the kernel uses both const and non-const
PMF qualifiers and device suspensors, so change the pmf_qual_t and
device_suspensor_t typedefs from "pointers to const" to non-pointer,
non-const types.


# 1.68 09-Feb-2010 jym

Wrap a comment; add a space after a comma to another (align with next line)


# 1.67 09-Feb-2010 jym

Use roundup2() instead of hardcoding the operation.


Revision tags: uebayasi-xip-base
# 1.66 08-Jan-2010 dyoung

branches: 1.66.2;
Expand PMF_FN_* macros.


Revision tags: matt-premerge-20091211
# 1.65 21-Nov-2009 rmind

Use lwp_getpcb() on x86 MD code, clean from struct user usage.


# 1.64 07-Nov-2009 cegger

Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.63 27-Mar-2009 drochner

Rearrange TSC inter-CPU synchronization code so that the gory details
are dealt with in x86/tsc.c and callers don't have to care that much.
Also add some comments and make some variables static.
approved by ad (a while ago)


Revision tags: nick-hppapmap-base2
# 1.62 21-Jan-2009 bouyer

branches: 1.62.2;
Make i386 config without NPX work, problem reported and fix tested by
Wojciech Galazka.
While there change a __i386__ to i386 for consistency.


Revision tags: mjf-devfs2-base
# 1.61 23-Dec-2008 cegger

move from malloc to kmem


# 1.60 19-Dec-2008 ad

PR kern/40213 my i386 machine can't boot because of tsc

- Patch in atomic_cas_64() twice. The first patch is early and makes it
the MP-atomic version available if we have cmpxchg8b. The second patch
strips the lock prefix if ncpu==1.

- Fix the i486 atomic_cas_64() to not unconditionally enable interrupts.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.59 06-Nov-2008 cegger

Link cpus in the order they are attaching and not in inverse order.


# 1.58 31-Oct-2008 rmind

- Avoid the race with CPU online/offline state changes, when setting the
affinity (cpu_lock protects these operations now).
- Disallow setting of state of CPU to to offline, if there are bound LWPs,
which have no CPU to migrate.
- Disallow setting of affinity for the LWP(s), if all CPUs in the dynamic
CPU-set are offline.
- sched_setaffinity: fix invalid check of kcpuset_isset().
- Rename cpu_setonline() to cpu_setstate().

Should fix PR/39349.


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1
# 1.57 15-Oct-2008 ad

branches: 1.57.2; 1.57.4;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base wrstuden-revivesa-base
# 1.56 03-Jun-2008 jmcneill

branches: 1.56.4;
If we boot with RB_MD1, register a NULL pmf handler for APs so we can
still suspend.


Revision tags: yamt-pf42-base3
# 1.55 02-Jun-2008 ad

- Don't bother using sse to copy/zero pages on demand. It turns out not
to be worth it.
- If the machine has sse, re-enable zeroing pages in the idle loop and
use the sse instructions so that we don't blow out the cache.


# 1.54 28-May-2008 ad

Remove X86_MAXPROCS. There is still a 32-cpu limit, but it's now using
the MI constants.


# 1.53 21-May-2008 ad

Do the errata patchup after identifying the CPU, to avoid badly formatted
output.


# 1.52 21-May-2008 ad

verbose -> debug for # page colours


Revision tags: hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.51 14-May-2008 ad

- cpu_attach: ensure that the boot processor is set up before trying to
initialize APs. We need the lapic set up and the boot processor may
not be attached first.

- mp_cpu_start: write back and invalidate the data cache before starting the
init IPI sequence. If a buggy BIOS has left the AP with cache disabled,
it might not be able to participate in the cache coherency protocol.


# 1.50 13-May-2008 ad

Be more conservative during AP startup. Don't let the AP access the lapic
or do any setup until the boot processor has finished the init sequence,
and add a few more delays.


# 1.49 12-May-2008 ad

- Make cpu_number() return MI index, otherwise the pmap cannot work on
systems with lapic IDs > X86_MAXPROCS.
- Kill cpu_info[] array and use MI cpu_lookup_byindex().


# 1.48 12-May-2008 ad

Don't crash if more than 32 cpus. Hopefully the boot processor will be
within the first 32 attached.


# 1.47 12-May-2008 ad

- Complain if unable to reset the lapic ID.
- Minor clean up.


# 1.46 12-May-2008 ad

cpu_hatch: hack around problem with multiple CPUs spinning in i8254_delay.


# 1.45 11-May-2008 ad

- Decouple the APIC ID from cpu_info[].
- Probe TSC frequency on each AP when hatching.


# 1.44 11-May-2008 ad

MP + apics are needed now so kill the #ifdefs


# 1.43 11-May-2008 ad

Don't reload LDTR unless a new value, which only happens for USER_LDT.


# 1.42 11-May-2008 ad

Stop using APIC IDs to identify CPUs for software purposes. Allows for
APIC IDs beyond 31, which has been possible for some time now.


# 1.41 11-May-2008 ad

Share cpu.h between the x86 ports.


# 1.40 11-May-2008 ad

Simplify x86 identcpu code, and share between i386/amd64.


# 1.39 10-May-2008 ad

If the boot processor's lapic has the wrong ID, reset it.


# 1.38 10-May-2008 ad

Improve x86 tsc handling:

- Ditch the cross-CPU calibration stuff. It didn't work properly, and it's
near impossible to synchronize the CPUs in a running system, because bus
traffic will interfere with any calibration attempt, messing up the
timings.

- Only enable the TSC on CPUs where we are sure it does not drift. If we are
On a known good CPU, give the TSC high timecounter quality, making it the
default.

- When booting CPUs, detect TSC skew and account for it. Most Intel MP
systems have synchronized counters, but that need not be true if the
system has a complicated bus structure. As far as I know, AMD systems
do not have synchronized TSCs and so we need to handle skew.

- While an AP is waiting to be set running, try and make the TSC drift by
entering a reduced power state. If we detect drift, ensure that the TSC
does not get a high timecounter quality. This should not happen and is
only for safety.

- Make cpu_counter() stuff LKM safe.


# 1.37 09-May-2008 joerg

Make cpu_idle a macro calling a function pointer on x86.
Select the Xen idle routine for Xen, mwait if supported by the CPU and
it is not AMD and halt otherwise. As reported by Christoph Egger,
AMD Barcelona keeps the CPU in C0 state with MWAIT, contrary to HLT,
which uses C1 and therefore much less power.


# 1.36 29-Apr-2008 ad

branches: 1.36.2;
Minor correction to previous.


# 1.35 29-Apr-2008 ad

Recognise two new boot flags:

-1 disable MP
-2 disable ACPI


# 1.34 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.33 24-Apr-2008 jmcneill

branches: 1.33.2;
Gracefully handle a condition where apic id >= X86_MAXPROCS rather than
panicing.


# 1.32 22-Apr-2008 tls

Commit a quick workaround for the not-power-of-two cache colors problem
pointed out by Simon (Simon's option #3): use the greatest power of two
which is a divisor of the desired number of cache colors.

This code might want to stay even after the cache probing code is fixed.


Revision tags: yamt-pf42-baseX yamt-pf42-X yamt-pf42-base
# 1.31 18-Apr-2008 cegger

branches: 1.31.2;
g/c unused ioapic_bsp_id.
Per discussion with bouyer.


# 1.30 17-Apr-2008 cegger

wrap long line. Requested and OK by simonb.


# 1.29 17-Apr-2008 yamt

cpu_debug_dump: s/curproc/curlwp/ in a message.


# 1.28 17-Apr-2008 cegger

use aprint_*_dev.
OK simonb


# 1.27 16-Apr-2008 cegger

- use aprint_*_dev and device_xname
- use POSIX integer types


# 1.26 13-Apr-2008 cegger

use device accessors and other misc cleanups


# 1.25 02-Apr-2008 ad

Add more error reporting to AP startup.


# 1.24 01-Apr-2008 ad

If MPDEBUG and waiting for the CPU to start, dump cpu_trace[] as it changes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.23 04-Mar-2008 cube

Split device_t/softc.


# 1.22 29-Feb-2008 dyoung

Use PMF_FN_ARGS, PMF_FN_PROTO.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 10-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Align cc_microtime and struct cpu_info to 64b.


# 1.20 30-Jan-2008 jmcneill

pmf: Naively track online/offline state of APs during suspend/resume.


# 1.19 23-Jan-2008 joerg

Initialise the Local Vector Table of the primary LAPIC directly after
enabling it. Explicitly initialise LINT0 as ExtInt and LINT1 as NMI,
the platform default. Mask the NMIs on the application processors and
mask the ExtInt if a IOAPIC was found.

With this patch, "disable ioapic" is supposed to work and it will allow
enabling the local APIC on all systems that have one to gain e.g. the
better clock interrupt.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.18 15-Jan-2008 joerg

Introduce optional cpu_offline_md to execute MD actions at the end of
cpu_offline. Use this on amd64/i386 to force a FPU save. As this was
triggered by npxsave_cpu/fpusave_cpu not working for a different CPU,
remove the cpu_info argument and adjust npxsave_*/fpusave_* to use bool
for the save.

OK ad@


# 1.17 14-Jan-2008 joerg

Ensure that non-primary CPUs save the FPU state on suspend.


Revision tags: matt-armv6-base
# 1.16 05-Jan-2008 yamt

- make amd64 use per-cpu tss.
- fix iopl syscall for amd64+xen.


# 1.15 04-Jan-2008 yamt

i386:
- make tss per-cpu. this considerably speeds up context switch for,
at least, pentium4, where ltr instruction seems very slow.
i386, xen:
- kill cpu_maxproc.
kvm86:
- adapt to per-cpu tss.
- cleanup and simplify.
- move kvm86_mp_lock to more meaningful place.
- disable preemption during a call.


Revision tags: vmlocking2-base3
# 1.14 18-Dec-2007 joerg

Add new IPI for saving CPU state explicitly, share high-level part of
ACPI wakeup code and teach it how to start the APs again. As a side
effect the CPU_START interface allows choosing between different
bootstrap codes more easily now.


# 1.13 15-Dec-2007 joerg

For now, remove the attempts to shutdown other CPUs and bring them back
online. It runs into issues in the pmap code and will handled
differently. This allows sysctl -w machdep.sleep_state=3 to at least
recover into a working system again.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.12 09-Dec-2007 jmcneill

branches: 1.12.2;
Merge jmcneill-pm branch.


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase reinoud-bufcleanup-base jmcneill-pm-base
# 1.11 04-Dec-2007 ad

branches: 1.11.2;
- Fix the locking around the i8254. Values for the TSC clock and lapic
delay function were wildly inaccurate due to multiple CPUs competing
in DELAY() during calibration, confusing the clock chip.
- Use i8254_delay() explictly in a few more places.


Revision tags: vmlocking2-base1 vmlocking-nbase
# 1.10 02-Dec-2007 ad

branches: 1.10.2;
Back out part of patch that got merged accidentally.


# 1.9 02-Dec-2007 ad

Use atomics to adjust ci_flags.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.8 14-Nov-2007 ad

cpu_hatch: change lapic initialization order.


# 1.7 13-Nov-2007 ad

In cpu_hatch(), recompute ci_tsc_freq instead of using the boot CPU's value.


# 1.6 12-Nov-2007 ad

- cpu_vendor was both an int and char[] on amd64 - fix it.
- Run the errata check/patch on all CPUs, not just the boot processor.


# 1.5 10-Nov-2007 ad

- When computing the TSC frequency, call i8254_delay() and not DELAY().
- Use atomics to adjust the pmap reference count, instead of taking locks.
- Implement I386_{SET,GET}_{FS,GS}BASE, allowing %fs and %gs to be used
as per-thread registers. This is compatible with FreeBSD.
- Run patches after we have attached CPUs, since we then know if the
system is uniprocessor or not. Eliminates a lot of #ifdef MULTIPROCESSOR
and makes running MP kernels on UP systems cheaper.
- Patch out many of the 'lock' prefixes to nops if uniprocessor.
- Do a wbinvd after patching to ensure that the trace/instruction cache
is up to date.


Revision tags: jmcneill-base
# 1.4 18-Oct-2007 yamt

branches: 1.4.2; 1.4.4;
merge yamt-x86pmap branch.

- reduce differences between amd64 and i386. notably, share pmap.c
between them. it makes several i386 pmap improvements available to
amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff.
- implement deferred pmap switching for amd64.
- remove LARGEPAGES option. always use large pages if available.
also, make it work on amd64.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 vmlocking-base yamt-x86pmap-base2
# 1.3 26-Sep-2007 ad

branches: 1.3.2;
x86 changes for pcc and LKMs.

- Replace most inline assembly with proper functions. As a side effect
this reduces the size of amd64 GENERIC by about 120kB, and i386 by a
smaller amount. Nearly all of the inlines did something slow, or something
that does not need to be fast.
- Make curcpu() and curlwp functions proper, unless __GNUC__ && _KERNEL.
In that case make them inlines. Makes curlwp LKM and preemption safe.
- Make bus_space and bus_dma more LKM friendly.
- Share a few more files between the ports.
- Other minor changes.


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base
# 1.2 29-Aug-2007 ad

branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8;
Merge most x86 changes from the vmlocking branch, except the threaded soft
interrupt stuff. This is mostly comprised of changes to the pmap modules to
work on multiprocessor systems without kernel_lock, and changes to speed up
tlb shootdowns.


# 1.1 23-Aug-2007 ad

branches: 1.1.2;
file cpu.c was initially added on branch vmlocking.