History log of /openbsd-current/sys/arch/mips64/include/cpu.h
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.147 09-Jun-2024 jca

Add a compiler barrier where missing in CPU_BUSY_CYCLE() implems

Having differences between architectures is asking for problems. And
adding a barrier here just makes sense in most cases. This is also what
cpu_relax() provides in Linux land.

ok kettenis@ claudio@


Revision tags: OPENBSD_7_5_BASE
# 1.146 25-Feb-2024 cheloha

clockintr: rename "struct clockintr_queue" to "struct clockqueue"

The code has outgrown the original name for this struct. Both the
external and internal APIs have used the "clockqueue" namespace for
some time when operating on it, and that name is eyeball-consistent
with "clockintr" and "clockrequest", so "clockqueue" it is.


# 1.145 24-Jan-2024 cheloha

clockintr: switch from callee- to caller-allocated clockintr structs

Currently, clockintr_establish() calls malloc(9) to allocate a
clockintr struct on behalf of the caller. mpi@ says this behavior is
incompatible with dt(4). In particular, calling malloc(9) during the
initialization of a PCB outside of dt_pcb_alloc() is (a) awkward and
(b) may conflict with future changes/optimizations to PCB allocation.

To side-step the problem, this patch changes the clockintr subsystem
to use caller-allocated clockintr structs instead of callee-allocated
structs.

clockintr_establish() is named after softintr_establish(), which uses
malloc(9) internally to create softintr objects. The clockintr subsystem
is no longer using malloc(9), so the "establish" naming is no longer apt.
To avoid confusion, this patch also renames "clockintr_establish" to
"clockintr_bind".

Requested by mpi@. Tweaked by mpi@.

Thread: https://marc.info/?l=openbsd-tech&m=170597126103504&w=2

ok claudio@ mlarkin@ mpi@


Revision tags: OPENBSD_7_4_BASE
# 1.144 23-Aug-2023 cheloha

all platforms: separate cpu_initclocks() from cpu_startclock()

To give the primary CPU an opportunity to perform clock interrupt
preparation in a machine-independent manner we need to separate the
"initialization" parts of cpu_initclocks() from the "start the clock
interrupt" parts. Currently, cpu_initclocks() does everything all at
once, so there is no space for this MI setup.

Many platforms have more-or-less already done this separation by
implementing a separate routine named "cpu_startclock()". This patch
promotes cpu_startclock() from de facto standard to mandatory API.

- Prototype cpu_startclock() in sys/systm.h alongside cpu_initclocks().
The separation of responsibility between the two routines is a bit
fuzzy but the basic guidelines are as follows:

+ cpu_initclocks() must initialize hz, stathz, and profhz, and call
clockintr_init().

+ cpu_startclock() must call clockintr_cpu_init() and start the clock
interrupt cycle on the calling CPU.

These guidelines will shift in the future, but that's the way things
stand as of *this* commit.

- In initclocks(): first call cpu_initclocks(), then do MI setup, and
last call cpu_startclock().

- On platforms where cpu_startclock() already exists: don't call
cpu_startclock() from cpu_initclocks() anymore.

- On platforms where cpu_startclock() doesn't yet exist: implement it.
Usually this is as simple as dividing cpu_initclocks() in two.

Tested on amd64 (i8254, lapic), arm64, i386 (i8254, lapic), macppc,
mips64/octeon, and sparc64. Tested on arm/armv7 (agtimer(4)) by
phessler@ and jmatthew@. Tested on m88k/luna88k by aoyama@. Tested
on powerpc64 by gkoehler@ and mlarkin@. Tested on riscv64 by
jmatthew@.

Thread: https://marc.info/?l=openbsd-tech&m=169195251322149&w=2


# 1.143 05-Aug-2023 guenther

cpu_idle_{enter,leave} are no-ops on mips64, so just #define
away the calls

ok jca@


# 1.142 25-Jul-2023 cheloha

statclock: move profil(2), GPROF code to profclock(), gmonclock()

This patch isolates profil(2) and GPROF from statclock(). Currently,
statclock() implements both profil(2) and GPROF through a complex
mechanism involving both platform code (setstatclockrate) and the
scheduler (pscnt, psdiv, and psratio). We have a machine-independent
interface to the clock interrupt hardware now, so we no longer need to
do it this way.

- Move profil(2)-specific code from statclock() to a new clock
interrupt callback, profclock(), in subr_prof.c. Each
schedstate_percpu has its own profclock handle. The profclock is
enabled/disabled for a given CPU when it is needed by the running
thread during mi_switch() and sched_exit().

- Move GPROF-specific code from statclock() to a new clock interrupt
callback, gmonclock(), in subr_prof.c. Where available, each cpu_info
has its own gmonclock handle . The gmonclock is enabled/disabled for
a given CPU via sysctl(2) in prof_state_toggle().

- Both profclock() and gmonclock() have a fixed period, profclock_period,
that is initialized during initclocks().

- Export clockintr_advance(), clockintr_cancel(), clockintr_establish(),
and clockintr_stagger() via <sys/clockintr.h>. They have external
callers now.

- Delete pscnt, psdiv, psratio. From schedstate_percpu, also delete
spc_pscnt and spc_psdiv. The statclock frequency is not dynamic
anymore so these variables are now useless.

- Delete code/state related to the dynamic statclock frequency from
kern_clockintr.c. The statclock frequency can still be pseudo-random,
so move the contents of clockintr_statvar_init() into clockintr_init().

With input from miod@, deraadt@, and claudio@. Early revisions
cleaned up by claudio. Early revisions tested by claudio@. Tested by
cheloha@ on amd64, arm64, macppc, octeon, and sparc64 (sun4v).
Compile- and boot- tested on i386 by mlarkin@. riscv64 compilation
bugs found by mlarkin@. Tested on riscv64 by jca@. Tested on
powerpc64 by gkoehler@.


Revision tags: OPENBSD_7_3_BASE
# 1.141 11-Jan-2023 visa

Add TLB bypass for instruction emulation

copyinsn() fetches a userland instruction through the direct map.
This lets emulation work with execute-only virtual memory mappings.

OK deraadt@


# 1.140 19-Nov-2022 cheloha

mips64, loongson, octeon: switch to clockintr

- Remove mips64-specific clock interrupt scheduling bits from cpu_info.
- Add missing tick_nsec initialization to cpu_initclocks().
- Disable the glxclk interrupt clock on loongson. visa@/miod@ say it
can be removed later if it isn't useful for anything else.
- Wire up cp0_intrclock.

Notes:

- The loongson apm_suspend() changes are untested, but deraadt@ claims
APM suspend/resume on loongson doesn't work anyway.
- loongson and octeon now have a randomized statclock(), stathz = hz.

With input from miod@, visa@. Tested by miod@, visa@.

Link: https://marc.info/?l=openbsd-tech&m=166776379603497&w=2

ok visa@ mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.139 22-Aug-2022 cheloha

mips64, octeon, loonson: trigger deferred clock interrupts from splx(9)

As with powerpc, powerpc64, and riscv64, on mips64 platforms we need
to isolate the clock interrupt schedule from the MD clock interrupt
code. To do this, we need to stop deferring clock interrupt work
until the next tick and instead defer the work until we logically
unmask the clock interrupt from splx(9).

Add a boolean (ci_clock_deferred) to the cpu_info struct to note
whether we need to trigger the clock interrupt by hand, and then
do so from splx(9) by calling md_triggerclock().

Currently md_triggerclock is only ever set to cp0_trigger_int5(). The
routine takes great care to ensure that INT5 has fired or will fire
before returning.

There are some loongson machines that use glxclk instead of CP0. They
can be switched to use CP0 later.

With input and advice from visa@ and miod@.

Compiled and extensively tested by visa@ and miod@ on various octeon
and loongson machines. No issues seen on octeon machines. miod@ saw
some odd things on loongsoon, but suggests that all issues are
probably unrelated to this patch.

Link: https://marc.info/?l=openbsd-tech&m=165929192702632&w=2

ok visa@, miod@


Revision tags: OPENBSD_7_1_BASE
# 1.138 28-Jan-2022 visa

Remove unused guarded read and write routines.

No objection from miod@


# 1.137 07-Oct-2021 visa

Remove unused TLB routines.


Revision tags: OPENBSD_7_0_BASE
# 1.136 24-Jul-2021 visa

Replace cpus_running with CPU_IS_RUNNING().


# 1.135 06-Jul-2021 kettenis

Introduce CPU_IS_RUNNING() and us it in scheduler-related code to prevent
waiting on CPUs that didn't spin up. This will allow us to spin down
CPUs in the future to save power as well.

ok mpi@


# 1.134 02-Jun-2021 cheloha

kernel: introduce per-CPU panic(9) message buffers

Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each
platform for use by panic(9). The first panic on a given CPU writes
its message to this buffer. Subsequent panics on a given CPU print
the panic message to the console but do not modify the buffer. This
aids debugging in two cases:

- If 2+ CPUs panic simultaneously there is no risk of garbled messages
in the panic buffer.

- If a CPU panics and then the operator causes a second panic while
using ddb(4), the operator can still recall the first failure on
a particular CPU.

Misc. changes to support this bigger change:

- Set panicstr atomically to identify the first CPU to reach panic().

- Tweak db_show_panic_cmd() to print all panic messages across all
CPUs. Prefix the first panic with an asterisk ('*').

- Prefer db_printf() to printf() during a panic if we have it.
Apparently it disturbs less global state.

- On amd64, tweak fault() to write the local panic buffer. This needs
more work.

Prompted by bluhm@ and deraadt@. Mostly written by deraadt@.
Discussed with bluhm@, deraadt@ and kettenis@.

Borne from a discussion on tech@ about making panic(9) more MP-safe:

https://marc.info/?l=openbsd-tech&m=162086462316143&w=2

ok kettenis@, visa@, bluhm@, deraadt@


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.146 25-Feb-2024 cheloha

clockintr: rename "struct clockintr_queue" to "struct clockqueue"

The code has outgrown the original name for this struct. Both the
external and internal APIs have used the "clockqueue" namespace for
some time when operating on it, and that name is eyeball-consistent
with "clockintr" and "clockrequest", so "clockqueue" it is.


# 1.145 24-Jan-2024 cheloha

clockintr: switch from callee- to caller-allocated clockintr structs

Currently, clockintr_establish() calls malloc(9) to allocate a
clockintr struct on behalf of the caller. mpi@ says this behavior is
incompatible with dt(4). In particular, calling malloc(9) during the
initialization of a PCB outside of dt_pcb_alloc() is (a) awkward and
(b) may conflict with future changes/optimizations to PCB allocation.

To side-step the problem, this patch changes the clockintr subsystem
to use caller-allocated clockintr structs instead of callee-allocated
structs.

clockintr_establish() is named after softintr_establish(), which uses
malloc(9) internally to create softintr objects. The clockintr subsystem
is no longer using malloc(9), so the "establish" naming is no longer apt.
To avoid confusion, this patch also renames "clockintr_establish" to
"clockintr_bind".

Requested by mpi@. Tweaked by mpi@.

Thread: https://marc.info/?l=openbsd-tech&m=170597126103504&w=2

ok claudio@ mlarkin@ mpi@


Revision tags: OPENBSD_7_4_BASE
# 1.144 23-Aug-2023 cheloha

all platforms: separate cpu_initclocks() from cpu_startclock()

To give the primary CPU an opportunity to perform clock interrupt
preparation in a machine-independent manner we need to separate the
"initialization" parts of cpu_initclocks() from the "start the clock
interrupt" parts. Currently, cpu_initclocks() does everything all at
once, so there is no space for this MI setup.

Many platforms have more-or-less already done this separation by
implementing a separate routine named "cpu_startclock()". This patch
promotes cpu_startclock() from de facto standard to mandatory API.

- Prototype cpu_startclock() in sys/systm.h alongside cpu_initclocks().
The separation of responsibility between the two routines is a bit
fuzzy but the basic guidelines are as follows:

+ cpu_initclocks() must initialize hz, stathz, and profhz, and call
clockintr_init().

+ cpu_startclock() must call clockintr_cpu_init() and start the clock
interrupt cycle on the calling CPU.

These guidelines will shift in the future, but that's the way things
stand as of *this* commit.

- In initclocks(): first call cpu_initclocks(), then do MI setup, and
last call cpu_startclock().

- On platforms where cpu_startclock() already exists: don't call
cpu_startclock() from cpu_initclocks() anymore.

- On platforms where cpu_startclock() doesn't yet exist: implement it.
Usually this is as simple as dividing cpu_initclocks() in two.

Tested on amd64 (i8254, lapic), arm64, i386 (i8254, lapic), macppc,
mips64/octeon, and sparc64. Tested on arm/armv7 (agtimer(4)) by
phessler@ and jmatthew@. Tested on m88k/luna88k by aoyama@. Tested
on powerpc64 by gkoehler@ and mlarkin@. Tested on riscv64 by
jmatthew@.

Thread: https://marc.info/?l=openbsd-tech&m=169195251322149&w=2


# 1.143 05-Aug-2023 guenther

cpu_idle_{enter,leave} are no-ops on mips64, so just #define
away the calls

ok jca@


# 1.142 25-Jul-2023 cheloha

statclock: move profil(2), GPROF code to profclock(), gmonclock()

This patch isolates profil(2) and GPROF from statclock(). Currently,
statclock() implements both profil(2) and GPROF through a complex
mechanism involving both platform code (setstatclockrate) and the
scheduler (pscnt, psdiv, and psratio). We have a machine-independent
interface to the clock interrupt hardware now, so we no longer need to
do it this way.

- Move profil(2)-specific code from statclock() to a new clock
interrupt callback, profclock(), in subr_prof.c. Each
schedstate_percpu has its own profclock handle. The profclock is
enabled/disabled for a given CPU when it is needed by the running
thread during mi_switch() and sched_exit().

- Move GPROF-specific code from statclock() to a new clock interrupt
callback, gmonclock(), in subr_prof.c. Where available, each cpu_info
has its own gmonclock handle . The gmonclock is enabled/disabled for
a given CPU via sysctl(2) in prof_state_toggle().

- Both profclock() and gmonclock() have a fixed period, profclock_period,
that is initialized during initclocks().

- Export clockintr_advance(), clockintr_cancel(), clockintr_establish(),
and clockintr_stagger() via <sys/clockintr.h>. They have external
callers now.

- Delete pscnt, psdiv, psratio. From schedstate_percpu, also delete
spc_pscnt and spc_psdiv. The statclock frequency is not dynamic
anymore so these variables are now useless.

- Delete code/state related to the dynamic statclock frequency from
kern_clockintr.c. The statclock frequency can still be pseudo-random,
so move the contents of clockintr_statvar_init() into clockintr_init().

With input from miod@, deraadt@, and claudio@. Early revisions
cleaned up by claudio. Early revisions tested by claudio@. Tested by
cheloha@ on amd64, arm64, macppc, octeon, and sparc64 (sun4v).
Compile- and boot- tested on i386 by mlarkin@. riscv64 compilation
bugs found by mlarkin@. Tested on riscv64 by jca@. Tested on
powerpc64 by gkoehler@.


Revision tags: OPENBSD_7_3_BASE
# 1.141 11-Jan-2023 visa

Add TLB bypass for instruction emulation

copyinsn() fetches a userland instruction through the direct map.
This lets emulation work with execute-only virtual memory mappings.

OK deraadt@


# 1.140 19-Nov-2022 cheloha

mips64, loongson, octeon: switch to clockintr

- Remove mips64-specific clock interrupt scheduling bits from cpu_info.
- Add missing tick_nsec initialization to cpu_initclocks().
- Disable the glxclk interrupt clock on loongson. visa@/miod@ say it
can be removed later if it isn't useful for anything else.
- Wire up cp0_intrclock.

Notes:

- The loongson apm_suspend() changes are untested, but deraadt@ claims
APM suspend/resume on loongson doesn't work anyway.
- loongson and octeon now have a randomized statclock(), stathz = hz.

With input from miod@, visa@. Tested by miod@, visa@.

Link: https://marc.info/?l=openbsd-tech&m=166776379603497&w=2

ok visa@ mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.139 22-Aug-2022 cheloha

mips64, octeon, loonson: trigger deferred clock interrupts from splx(9)

As with powerpc, powerpc64, and riscv64, on mips64 platforms we need
to isolate the clock interrupt schedule from the MD clock interrupt
code. To do this, we need to stop deferring clock interrupt work
until the next tick and instead defer the work until we logically
unmask the clock interrupt from splx(9).

Add a boolean (ci_clock_deferred) to the cpu_info struct to note
whether we need to trigger the clock interrupt by hand, and then
do so from splx(9) by calling md_triggerclock().

Currently md_triggerclock is only ever set to cp0_trigger_int5(). The
routine takes great care to ensure that INT5 has fired or will fire
before returning.

There are some loongson machines that use glxclk instead of CP0. They
can be switched to use CP0 later.

With input and advice from visa@ and miod@.

Compiled and extensively tested by visa@ and miod@ on various octeon
and loongson machines. No issues seen on octeon machines. miod@ saw
some odd things on loongsoon, but suggests that all issues are
probably unrelated to this patch.

Link: https://marc.info/?l=openbsd-tech&m=165929192702632&w=2

ok visa@, miod@


Revision tags: OPENBSD_7_1_BASE
# 1.138 28-Jan-2022 visa

Remove unused guarded read and write routines.

No objection from miod@


# 1.137 07-Oct-2021 visa

Remove unused TLB routines.


Revision tags: OPENBSD_7_0_BASE
# 1.136 24-Jul-2021 visa

Replace cpus_running with CPU_IS_RUNNING().


# 1.135 06-Jul-2021 kettenis

Introduce CPU_IS_RUNNING() and us it in scheduler-related code to prevent
waiting on CPUs that didn't spin up. This will allow us to spin down
CPUs in the future to save power as well.

ok mpi@


# 1.134 02-Jun-2021 cheloha

kernel: introduce per-CPU panic(9) message buffers

Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each
platform for use by panic(9). The first panic on a given CPU writes
its message to this buffer. Subsequent panics on a given CPU print
the panic message to the console but do not modify the buffer. This
aids debugging in two cases:

- If 2+ CPUs panic simultaneously there is no risk of garbled messages
in the panic buffer.

- If a CPU panics and then the operator causes a second panic while
using ddb(4), the operator can still recall the first failure on
a particular CPU.

Misc. changes to support this bigger change:

- Set panicstr atomically to identify the first CPU to reach panic().

- Tweak db_show_panic_cmd() to print all panic messages across all
CPUs. Prefix the first panic with an asterisk ('*').

- Prefer db_printf() to printf() during a panic if we have it.
Apparently it disturbs less global state.

- On amd64, tweak fault() to write the local panic buffer. This needs
more work.

Prompted by bluhm@ and deraadt@. Mostly written by deraadt@.
Discussed with bluhm@, deraadt@ and kettenis@.

Borne from a discussion on tech@ about making panic(9) more MP-safe:

https://marc.info/?l=openbsd-tech&m=162086462316143&w=2

ok kettenis@, visa@, bluhm@, deraadt@


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.145 24-Jan-2024 cheloha

clockintr: switch from callee- to caller-allocated clockintr structs

Currently, clockintr_establish() calls malloc(9) to allocate a
clockintr struct on behalf of the caller. mpi@ says this behavior is
incompatible with dt(4). In particular, calling malloc(9) during the
initialization of a PCB outside of dt_pcb_alloc() is (a) awkward and
(b) may conflict with future changes/optimizations to PCB allocation.

To side-step the problem, this patch changes the clockintr subsystem
to use caller-allocated clockintr structs instead of callee-allocated
structs.

clockintr_establish() is named after softintr_establish(), which uses
malloc(9) internally to create softintr objects. The clockintr subsystem
is no longer using malloc(9), so the "establish" naming is no longer apt.
To avoid confusion, this patch also renames "clockintr_establish" to
"clockintr_bind".

Requested by mpi@. Tweaked by mpi@.

Thread: https://marc.info/?l=openbsd-tech&m=170597126103504&w=2

ok claudio@ mlarkin@ mpi@


Revision tags: OPENBSD_7_4_BASE
# 1.144 23-Aug-2023 cheloha

all platforms: separate cpu_initclocks() from cpu_startclock()

To give the primary CPU an opportunity to perform clock interrupt
preparation in a machine-independent manner we need to separate the
"initialization" parts of cpu_initclocks() from the "start the clock
interrupt" parts. Currently, cpu_initclocks() does everything all at
once, so there is no space for this MI setup.

Many platforms have more-or-less already done this separation by
implementing a separate routine named "cpu_startclock()". This patch
promotes cpu_startclock() from de facto standard to mandatory API.

- Prototype cpu_startclock() in sys/systm.h alongside cpu_initclocks().
The separation of responsibility between the two routines is a bit
fuzzy but the basic guidelines are as follows:

+ cpu_initclocks() must initialize hz, stathz, and profhz, and call
clockintr_init().

+ cpu_startclock() must call clockintr_cpu_init() and start the clock
interrupt cycle on the calling CPU.

These guidelines will shift in the future, but that's the way things
stand as of *this* commit.

- In initclocks(): first call cpu_initclocks(), then do MI setup, and
last call cpu_startclock().

- On platforms where cpu_startclock() already exists: don't call
cpu_startclock() from cpu_initclocks() anymore.

- On platforms where cpu_startclock() doesn't yet exist: implement it.
Usually this is as simple as dividing cpu_initclocks() in two.

Tested on amd64 (i8254, lapic), arm64, i386 (i8254, lapic), macppc,
mips64/octeon, and sparc64. Tested on arm/armv7 (agtimer(4)) by
phessler@ and jmatthew@. Tested on m88k/luna88k by aoyama@. Tested
on powerpc64 by gkoehler@ and mlarkin@. Tested on riscv64 by
jmatthew@.

Thread: https://marc.info/?l=openbsd-tech&m=169195251322149&w=2


# 1.143 05-Aug-2023 guenther

cpu_idle_{enter,leave} are no-ops on mips64, so just #define
away the calls

ok jca@


# 1.142 25-Jul-2023 cheloha

statclock: move profil(2), GPROF code to profclock(), gmonclock()

This patch isolates profil(2) and GPROF from statclock(). Currently,
statclock() implements both profil(2) and GPROF through a complex
mechanism involving both platform code (setstatclockrate) and the
scheduler (pscnt, psdiv, and psratio). We have a machine-independent
interface to the clock interrupt hardware now, so we no longer need to
do it this way.

- Move profil(2)-specific code from statclock() to a new clock
interrupt callback, profclock(), in subr_prof.c. Each
schedstate_percpu has its own profclock handle. The profclock is
enabled/disabled for a given CPU when it is needed by the running
thread during mi_switch() and sched_exit().

- Move GPROF-specific code from statclock() to a new clock interrupt
callback, gmonclock(), in subr_prof.c. Where available, each cpu_info
has its own gmonclock handle . The gmonclock is enabled/disabled for
a given CPU via sysctl(2) in prof_state_toggle().

- Both profclock() and gmonclock() have a fixed period, profclock_period,
that is initialized during initclocks().

- Export clockintr_advance(), clockintr_cancel(), clockintr_establish(),
and clockintr_stagger() via <sys/clockintr.h>. They have external
callers now.

- Delete pscnt, psdiv, psratio. From schedstate_percpu, also delete
spc_pscnt and spc_psdiv. The statclock frequency is not dynamic
anymore so these variables are now useless.

- Delete code/state related to the dynamic statclock frequency from
kern_clockintr.c. The statclock frequency can still be pseudo-random,
so move the contents of clockintr_statvar_init() into clockintr_init().

With input from miod@, deraadt@, and claudio@. Early revisions
cleaned up by claudio. Early revisions tested by claudio@. Tested by
cheloha@ on amd64, arm64, macppc, octeon, and sparc64 (sun4v).
Compile- and boot- tested on i386 by mlarkin@. riscv64 compilation
bugs found by mlarkin@. Tested on riscv64 by jca@. Tested on
powerpc64 by gkoehler@.


Revision tags: OPENBSD_7_3_BASE
# 1.141 11-Jan-2023 visa

Add TLB bypass for instruction emulation

copyinsn() fetches a userland instruction through the direct map.
This lets emulation work with execute-only virtual memory mappings.

OK deraadt@


# 1.140 19-Nov-2022 cheloha

mips64, loongson, octeon: switch to clockintr

- Remove mips64-specific clock interrupt scheduling bits from cpu_info.
- Add missing tick_nsec initialization to cpu_initclocks().
- Disable the glxclk interrupt clock on loongson. visa@/miod@ say it
can be removed later if it isn't useful for anything else.
- Wire up cp0_intrclock.

Notes:

- The loongson apm_suspend() changes are untested, but deraadt@ claims
APM suspend/resume on loongson doesn't work anyway.
- loongson and octeon now have a randomized statclock(), stathz = hz.

With input from miod@, visa@. Tested by miod@, visa@.

Link: https://marc.info/?l=openbsd-tech&m=166776379603497&w=2

ok visa@ mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.139 22-Aug-2022 cheloha

mips64, octeon, loonson: trigger deferred clock interrupts from splx(9)

As with powerpc, powerpc64, and riscv64, on mips64 platforms we need
to isolate the clock interrupt schedule from the MD clock interrupt
code. To do this, we need to stop deferring clock interrupt work
until the next tick and instead defer the work until we logically
unmask the clock interrupt from splx(9).

Add a boolean (ci_clock_deferred) to the cpu_info struct to note
whether we need to trigger the clock interrupt by hand, and then
do so from splx(9) by calling md_triggerclock().

Currently md_triggerclock is only ever set to cp0_trigger_int5(). The
routine takes great care to ensure that INT5 has fired or will fire
before returning.

There are some loongson machines that use glxclk instead of CP0. They
can be switched to use CP0 later.

With input and advice from visa@ and miod@.

Compiled and extensively tested by visa@ and miod@ on various octeon
and loongson machines. No issues seen on octeon machines. miod@ saw
some odd things on loongsoon, but suggests that all issues are
probably unrelated to this patch.

Link: https://marc.info/?l=openbsd-tech&m=165929192702632&w=2

ok visa@, miod@


Revision tags: OPENBSD_7_1_BASE
# 1.138 28-Jan-2022 visa

Remove unused guarded read and write routines.

No objection from miod@


# 1.137 07-Oct-2021 visa

Remove unused TLB routines.


Revision tags: OPENBSD_7_0_BASE
# 1.136 24-Jul-2021 visa

Replace cpus_running with CPU_IS_RUNNING().


# 1.135 06-Jul-2021 kettenis

Introduce CPU_IS_RUNNING() and us it in scheduler-related code to prevent
waiting on CPUs that didn't spin up. This will allow us to spin down
CPUs in the future to save power as well.

ok mpi@


# 1.134 02-Jun-2021 cheloha

kernel: introduce per-CPU panic(9) message buffers

Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each
platform for use by panic(9). The first panic on a given CPU writes
its message to this buffer. Subsequent panics on a given CPU print
the panic message to the console but do not modify the buffer. This
aids debugging in two cases:

- If 2+ CPUs panic simultaneously there is no risk of garbled messages
in the panic buffer.

- If a CPU panics and then the operator causes a second panic while
using ddb(4), the operator can still recall the first failure on
a particular CPU.

Misc. changes to support this bigger change:

- Set panicstr atomically to identify the first CPU to reach panic().

- Tweak db_show_panic_cmd() to print all panic messages across all
CPUs. Prefix the first panic with an asterisk ('*').

- Prefer db_printf() to printf() during a panic if we have it.
Apparently it disturbs less global state.

- On amd64, tweak fault() to write the local panic buffer. This needs
more work.

Prompted by bluhm@ and deraadt@. Mostly written by deraadt@.
Discussed with bluhm@, deraadt@ and kettenis@.

Borne from a discussion on tech@ about making panic(9) more MP-safe:

https://marc.info/?l=openbsd-tech&m=162086462316143&w=2

ok kettenis@, visa@, bluhm@, deraadt@


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.144 23-Aug-2023 cheloha

all platforms: separate cpu_initclocks() from cpu_startclock()

To give the primary CPU an opportunity to perform clock interrupt
preparation in a machine-independent manner we need to separate the
"initialization" parts of cpu_initclocks() from the "start the clock
interrupt" parts. Currently, cpu_initclocks() does everything all at
once, so there is no space for this MI setup.

Many platforms have more-or-less already done this separation by
implementing a separate routine named "cpu_startclock()". This patch
promotes cpu_startclock() from de facto standard to mandatory API.

- Prototype cpu_startclock() in sys/systm.h alongside cpu_initclocks().
The separation of responsibility between the two routines is a bit
fuzzy but the basic guidelines are as follows:

+ cpu_initclocks() must initialize hz, stathz, and profhz, and call
clockintr_init().

+ cpu_startclock() must call clockintr_cpu_init() and start the clock
interrupt cycle on the calling CPU.

These guidelines will shift in the future, but that's the way things
stand as of *this* commit.

- In initclocks(): first call cpu_initclocks(), then do MI setup, and
last call cpu_startclock().

- On platforms where cpu_startclock() already exists: don't call
cpu_startclock() from cpu_initclocks() anymore.

- On platforms where cpu_startclock() doesn't yet exist: implement it.
Usually this is as simple as dividing cpu_initclocks() in two.

Tested on amd64 (i8254, lapic), arm64, i386 (i8254, lapic), macppc,
mips64/octeon, and sparc64. Tested on arm/armv7 (agtimer(4)) by
phessler@ and jmatthew@. Tested on m88k/luna88k by aoyama@. Tested
on powerpc64 by gkoehler@ and mlarkin@. Tested on riscv64 by
jmatthew@.

Thread: https://marc.info/?l=openbsd-tech&m=169195251322149&w=2


# 1.143 05-Aug-2023 guenther

cpu_idle_{enter,leave} are no-ops on mips64, so just #define
away the calls

ok jca@


# 1.142 25-Jul-2023 cheloha

statclock: move profil(2), GPROF code to profclock(), gmonclock()

This patch isolates profil(2) and GPROF from statclock(). Currently,
statclock() implements both profil(2) and GPROF through a complex
mechanism involving both platform code (setstatclockrate) and the
scheduler (pscnt, psdiv, and psratio). We have a machine-independent
interface to the clock interrupt hardware now, so we no longer need to
do it this way.

- Move profil(2)-specific code from statclock() to a new clock
interrupt callback, profclock(), in subr_prof.c. Each
schedstate_percpu has its own profclock handle. The profclock is
enabled/disabled for a given CPU when it is needed by the running
thread during mi_switch() and sched_exit().

- Move GPROF-specific code from statclock() to a new clock interrupt
callback, gmonclock(), in subr_prof.c. Where available, each cpu_info
has its own gmonclock handle . The gmonclock is enabled/disabled for
a given CPU via sysctl(2) in prof_state_toggle().

- Both profclock() and gmonclock() have a fixed period, profclock_period,
that is initialized during initclocks().

- Export clockintr_advance(), clockintr_cancel(), clockintr_establish(),
and clockintr_stagger() via <sys/clockintr.h>. They have external
callers now.

- Delete pscnt, psdiv, psratio. From schedstate_percpu, also delete
spc_pscnt and spc_psdiv. The statclock frequency is not dynamic
anymore so these variables are now useless.

- Delete code/state related to the dynamic statclock frequency from
kern_clockintr.c. The statclock frequency can still be pseudo-random,
so move the contents of clockintr_statvar_init() into clockintr_init().

With input from miod@, deraadt@, and claudio@. Early revisions
cleaned up by claudio. Early revisions tested by claudio@. Tested by
cheloha@ on amd64, arm64, macppc, octeon, and sparc64 (sun4v).
Compile- and boot- tested on i386 by mlarkin@. riscv64 compilation
bugs found by mlarkin@. Tested on riscv64 by jca@. Tested on
powerpc64 by gkoehler@.


Revision tags: OPENBSD_7_3_BASE
# 1.141 11-Jan-2023 visa

Add TLB bypass for instruction emulation

copyinsn() fetches a userland instruction through the direct map.
This lets emulation work with execute-only virtual memory mappings.

OK deraadt@


# 1.140 19-Nov-2022 cheloha

mips64, loongson, octeon: switch to clockintr

- Remove mips64-specific clock interrupt scheduling bits from cpu_info.
- Add missing tick_nsec initialization to cpu_initclocks().
- Disable the glxclk interrupt clock on loongson. visa@/miod@ say it
can be removed later if it isn't useful for anything else.
- Wire up cp0_intrclock.

Notes:

- The loongson apm_suspend() changes are untested, but deraadt@ claims
APM suspend/resume on loongson doesn't work anyway.
- loongson and octeon now have a randomized statclock(), stathz = hz.

With input from miod@, visa@. Tested by miod@, visa@.

Link: https://marc.info/?l=openbsd-tech&m=166776379603497&w=2

ok visa@ mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.139 22-Aug-2022 cheloha

mips64, octeon, loonson: trigger deferred clock interrupts from splx(9)

As with powerpc, powerpc64, and riscv64, on mips64 platforms we need
to isolate the clock interrupt schedule from the MD clock interrupt
code. To do this, we need to stop deferring clock interrupt work
until the next tick and instead defer the work until we logically
unmask the clock interrupt from splx(9).

Add a boolean (ci_clock_deferred) to the cpu_info struct to note
whether we need to trigger the clock interrupt by hand, and then
do so from splx(9) by calling md_triggerclock().

Currently md_triggerclock is only ever set to cp0_trigger_int5(). The
routine takes great care to ensure that INT5 has fired or will fire
before returning.

There are some loongson machines that use glxclk instead of CP0. They
can be switched to use CP0 later.

With input and advice from visa@ and miod@.

Compiled and extensively tested by visa@ and miod@ on various octeon
and loongson machines. No issues seen on octeon machines. miod@ saw
some odd things on loongsoon, but suggests that all issues are
probably unrelated to this patch.

Link: https://marc.info/?l=openbsd-tech&m=165929192702632&w=2

ok visa@, miod@


Revision tags: OPENBSD_7_1_BASE
# 1.138 28-Jan-2022 visa

Remove unused guarded read and write routines.

No objection from miod@


# 1.137 07-Oct-2021 visa

Remove unused TLB routines.


Revision tags: OPENBSD_7_0_BASE
# 1.136 24-Jul-2021 visa

Replace cpus_running with CPU_IS_RUNNING().


# 1.135 06-Jul-2021 kettenis

Introduce CPU_IS_RUNNING() and us it in scheduler-related code to prevent
waiting on CPUs that didn't spin up. This will allow us to spin down
CPUs in the future to save power as well.

ok mpi@


# 1.134 02-Jun-2021 cheloha

kernel: introduce per-CPU panic(9) message buffers

Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each
platform for use by panic(9). The first panic on a given CPU writes
its message to this buffer. Subsequent panics on a given CPU print
the panic message to the console but do not modify the buffer. This
aids debugging in two cases:

- If 2+ CPUs panic simultaneously there is no risk of garbled messages
in the panic buffer.

- If a CPU panics and then the operator causes a second panic while
using ddb(4), the operator can still recall the first failure on
a particular CPU.

Misc. changes to support this bigger change:

- Set panicstr atomically to identify the first CPU to reach panic().

- Tweak db_show_panic_cmd() to print all panic messages across all
CPUs. Prefix the first panic with an asterisk ('*').

- Prefer db_printf() to printf() during a panic if we have it.
Apparently it disturbs less global state.

- On amd64, tweak fault() to write the local panic buffer. This needs
more work.

Prompted by bluhm@ and deraadt@. Mostly written by deraadt@.
Discussed with bluhm@, deraadt@ and kettenis@.

Borne from a discussion on tech@ about making panic(9) more MP-safe:

https://marc.info/?l=openbsd-tech&m=162086462316143&w=2

ok kettenis@, visa@, bluhm@, deraadt@


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.143 05-Aug-2023 guenther

cpu_idle_{enter,leave} are no-ops on mips64, so just #define
away the calls

ok jca@


# 1.142 25-Jul-2023 cheloha

statclock: move profil(2), GPROF code to profclock(), gmonclock()

This patch isolates profil(2) and GPROF from statclock(). Currently,
statclock() implements both profil(2) and GPROF through a complex
mechanism involving both platform code (setstatclockrate) and the
scheduler (pscnt, psdiv, and psratio). We have a machine-independent
interface to the clock interrupt hardware now, so we no longer need to
do it this way.

- Move profil(2)-specific code from statclock() to a new clock
interrupt callback, profclock(), in subr_prof.c. Each
schedstate_percpu has its own profclock handle. The profclock is
enabled/disabled for a given CPU when it is needed by the running
thread during mi_switch() and sched_exit().

- Move GPROF-specific code from statclock() to a new clock interrupt
callback, gmonclock(), in subr_prof.c. Where available, each cpu_info
has its own gmonclock handle . The gmonclock is enabled/disabled for
a given CPU via sysctl(2) in prof_state_toggle().

- Both profclock() and gmonclock() have a fixed period, profclock_period,
that is initialized during initclocks().

- Export clockintr_advance(), clockintr_cancel(), clockintr_establish(),
and clockintr_stagger() via <sys/clockintr.h>. They have external
callers now.

- Delete pscnt, psdiv, psratio. From schedstate_percpu, also delete
spc_pscnt and spc_psdiv. The statclock frequency is not dynamic
anymore so these variables are now useless.

- Delete code/state related to the dynamic statclock frequency from
kern_clockintr.c. The statclock frequency can still be pseudo-random,
so move the contents of clockintr_statvar_init() into clockintr_init().

With input from miod@, deraadt@, and claudio@. Early revisions
cleaned up by claudio. Early revisions tested by claudio@. Tested by
cheloha@ on amd64, arm64, macppc, octeon, and sparc64 (sun4v).
Compile- and boot- tested on i386 by mlarkin@. riscv64 compilation
bugs found by mlarkin@. Tested on riscv64 by jca@. Tested on
powerpc64 by gkoehler@.


Revision tags: OPENBSD_7_3_BASE
# 1.141 11-Jan-2023 visa

Add TLB bypass for instruction emulation

copyinsn() fetches a userland instruction through the direct map.
This lets emulation work with execute-only virtual memory mappings.

OK deraadt@


# 1.140 19-Nov-2022 cheloha

mips64, loongson, octeon: switch to clockintr

- Remove mips64-specific clock interrupt scheduling bits from cpu_info.
- Add missing tick_nsec initialization to cpu_initclocks().
- Disable the glxclk interrupt clock on loongson. visa@/miod@ say it
can be removed later if it isn't useful for anything else.
- Wire up cp0_intrclock.

Notes:

- The loongson apm_suspend() changes are untested, but deraadt@ claims
APM suspend/resume on loongson doesn't work anyway.
- loongson and octeon now have a randomized statclock(), stathz = hz.

With input from miod@, visa@. Tested by miod@, visa@.

Link: https://marc.info/?l=openbsd-tech&m=166776379603497&w=2

ok visa@ mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.139 22-Aug-2022 cheloha

mips64, octeon, loonson: trigger deferred clock interrupts from splx(9)

As with powerpc, powerpc64, and riscv64, on mips64 platforms we need
to isolate the clock interrupt schedule from the MD clock interrupt
code. To do this, we need to stop deferring clock interrupt work
until the next tick and instead defer the work until we logically
unmask the clock interrupt from splx(9).

Add a boolean (ci_clock_deferred) to the cpu_info struct to note
whether we need to trigger the clock interrupt by hand, and then
do so from splx(9) by calling md_triggerclock().

Currently md_triggerclock is only ever set to cp0_trigger_int5(). The
routine takes great care to ensure that INT5 has fired or will fire
before returning.

There are some loongson machines that use glxclk instead of CP0. They
can be switched to use CP0 later.

With input and advice from visa@ and miod@.

Compiled and extensively tested by visa@ and miod@ on various octeon
and loongson machines. No issues seen on octeon machines. miod@ saw
some odd things on loongsoon, but suggests that all issues are
probably unrelated to this patch.

Link: https://marc.info/?l=openbsd-tech&m=165929192702632&w=2

ok visa@, miod@


Revision tags: OPENBSD_7_1_BASE
# 1.138 28-Jan-2022 visa

Remove unused guarded read and write routines.

No objection from miod@


# 1.137 07-Oct-2021 visa

Remove unused TLB routines.


Revision tags: OPENBSD_7_0_BASE
# 1.136 24-Jul-2021 visa

Replace cpus_running with CPU_IS_RUNNING().


# 1.135 06-Jul-2021 kettenis

Introduce CPU_IS_RUNNING() and us it in scheduler-related code to prevent
waiting on CPUs that didn't spin up. This will allow us to spin down
CPUs in the future to save power as well.

ok mpi@


# 1.134 02-Jun-2021 cheloha

kernel: introduce per-CPU panic(9) message buffers

Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each
platform for use by panic(9). The first panic on a given CPU writes
its message to this buffer. Subsequent panics on a given CPU print
the panic message to the console but do not modify the buffer. This
aids debugging in two cases:

- If 2+ CPUs panic simultaneously there is no risk of garbled messages
in the panic buffer.

- If a CPU panics and then the operator causes a second panic while
using ddb(4), the operator can still recall the first failure on
a particular CPU.

Misc. changes to support this bigger change:

- Set panicstr atomically to identify the first CPU to reach panic().

- Tweak db_show_panic_cmd() to print all panic messages across all
CPUs. Prefix the first panic with an asterisk ('*').

- Prefer db_printf() to printf() during a panic if we have it.
Apparently it disturbs less global state.

- On amd64, tweak fault() to write the local panic buffer. This needs
more work.

Prompted by bluhm@ and deraadt@. Mostly written by deraadt@.
Discussed with bluhm@, deraadt@ and kettenis@.

Borne from a discussion on tech@ about making panic(9) more MP-safe:

https://marc.info/?l=openbsd-tech&m=162086462316143&w=2

ok kettenis@, visa@, bluhm@, deraadt@


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.142 25-Jul-2023 cheloha

statclock: move profil(2), GPROF code to profclock(), gmonclock()

This patch isolates profil(2) and GPROF from statclock(). Currently,
statclock() implements both profil(2) and GPROF through a complex
mechanism involving both platform code (setstatclockrate) and the
scheduler (pscnt, psdiv, and psratio). We have a machine-independent
interface to the clock interrupt hardware now, so we no longer need to
do it this way.

- Move profil(2)-specific code from statclock() to a new clock
interrupt callback, profclock(), in subr_prof.c. Each
schedstate_percpu has its own profclock handle. The profclock is
enabled/disabled for a given CPU when it is needed by the running
thread during mi_switch() and sched_exit().

- Move GPROF-specific code from statclock() to a new clock interrupt
callback, gmonclock(), in subr_prof.c. Where available, each cpu_info
has its own gmonclock handle . The gmonclock is enabled/disabled for
a given CPU via sysctl(2) in prof_state_toggle().

- Both profclock() and gmonclock() have a fixed period, profclock_period,
that is initialized during initclocks().

- Export clockintr_advance(), clockintr_cancel(), clockintr_establish(),
and clockintr_stagger() via <sys/clockintr.h>. They have external
callers now.

- Delete pscnt, psdiv, psratio. From schedstate_percpu, also delete
spc_pscnt and spc_psdiv. The statclock frequency is not dynamic
anymore so these variables are now useless.

- Delete code/state related to the dynamic statclock frequency from
kern_clockintr.c. The statclock frequency can still be pseudo-random,
so move the contents of clockintr_statvar_init() into clockintr_init().

With input from miod@, deraadt@, and claudio@. Early revisions
cleaned up by claudio. Early revisions tested by claudio@. Tested by
cheloha@ on amd64, arm64, macppc, octeon, and sparc64 (sun4v).
Compile- and boot- tested on i386 by mlarkin@. riscv64 compilation
bugs found by mlarkin@. Tested on riscv64 by jca@. Tested on
powerpc64 by gkoehler@.


Revision tags: OPENBSD_7_3_BASE
# 1.141 11-Jan-2023 visa

Add TLB bypass for instruction emulation

copyinsn() fetches a userland instruction through the direct map.
This lets emulation work with execute-only virtual memory mappings.

OK deraadt@


# 1.140 19-Nov-2022 cheloha

mips64, loongson, octeon: switch to clockintr

- Remove mips64-specific clock interrupt scheduling bits from cpu_info.
- Add missing tick_nsec initialization to cpu_initclocks().
- Disable the glxclk interrupt clock on loongson. visa@/miod@ say it
can be removed later if it isn't useful for anything else.
- Wire up cp0_intrclock.

Notes:

- The loongson apm_suspend() changes are untested, but deraadt@ claims
APM suspend/resume on loongson doesn't work anyway.
- loongson and octeon now have a randomized statclock(), stathz = hz.

With input from miod@, visa@. Tested by miod@, visa@.

Link: https://marc.info/?l=openbsd-tech&m=166776379603497&w=2

ok visa@ mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.139 22-Aug-2022 cheloha

mips64, octeon, loonson: trigger deferred clock interrupts from splx(9)

As with powerpc, powerpc64, and riscv64, on mips64 platforms we need
to isolate the clock interrupt schedule from the MD clock interrupt
code. To do this, we need to stop deferring clock interrupt work
until the next tick and instead defer the work until we logically
unmask the clock interrupt from splx(9).

Add a boolean (ci_clock_deferred) to the cpu_info struct to note
whether we need to trigger the clock interrupt by hand, and then
do so from splx(9) by calling md_triggerclock().

Currently md_triggerclock is only ever set to cp0_trigger_int5(). The
routine takes great care to ensure that INT5 has fired or will fire
before returning.

There are some loongson machines that use glxclk instead of CP0. They
can be switched to use CP0 later.

With input and advice from visa@ and miod@.

Compiled and extensively tested by visa@ and miod@ on various octeon
and loongson machines. No issues seen on octeon machines. miod@ saw
some odd things on loongsoon, but suggests that all issues are
probably unrelated to this patch.

Link: https://marc.info/?l=openbsd-tech&m=165929192702632&w=2

ok visa@, miod@


Revision tags: OPENBSD_7_1_BASE
# 1.138 28-Jan-2022 visa

Remove unused guarded read and write routines.

No objection from miod@


# 1.137 07-Oct-2021 visa

Remove unused TLB routines.


Revision tags: OPENBSD_7_0_BASE
# 1.136 24-Jul-2021 visa

Replace cpus_running with CPU_IS_RUNNING().


# 1.135 06-Jul-2021 kettenis

Introduce CPU_IS_RUNNING() and us it in scheduler-related code to prevent
waiting on CPUs that didn't spin up. This will allow us to spin down
CPUs in the future to save power as well.

ok mpi@


# 1.134 02-Jun-2021 cheloha

kernel: introduce per-CPU panic(9) message buffers

Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each
platform for use by panic(9). The first panic on a given CPU writes
its message to this buffer. Subsequent panics on a given CPU print
the panic message to the console but do not modify the buffer. This
aids debugging in two cases:

- If 2+ CPUs panic simultaneously there is no risk of garbled messages
in the panic buffer.

- If a CPU panics and then the operator causes a second panic while
using ddb(4), the operator can still recall the first failure on
a particular CPU.

Misc. changes to support this bigger change:

- Set panicstr atomically to identify the first CPU to reach panic().

- Tweak db_show_panic_cmd() to print all panic messages across all
CPUs. Prefix the first panic with an asterisk ('*').

- Prefer db_printf() to printf() during a panic if we have it.
Apparently it disturbs less global state.

- On amd64, tweak fault() to write the local panic buffer. This needs
more work.

Prompted by bluhm@ and deraadt@. Mostly written by deraadt@.
Discussed with bluhm@, deraadt@ and kettenis@.

Borne from a discussion on tech@ about making panic(9) more MP-safe:

https://marc.info/?l=openbsd-tech&m=162086462316143&w=2

ok kettenis@, visa@, bluhm@, deraadt@


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.141 11-Jan-2023 visa

Add TLB bypass for instruction emulation

copyinsn() fetches a userland instruction through the direct map.
This lets emulation work with execute-only virtual memory mappings.

OK deraadt@


# 1.140 19-Nov-2022 cheloha

mips64, loongson, octeon: switch to clockintr

- Remove mips64-specific clock interrupt scheduling bits from cpu_info.
- Add missing tick_nsec initialization to cpu_initclocks().
- Disable the glxclk interrupt clock on loongson. visa@/miod@ say it
can be removed later if it isn't useful for anything else.
- Wire up cp0_intrclock.

Notes:

- The loongson apm_suspend() changes are untested, but deraadt@ claims
APM suspend/resume on loongson doesn't work anyway.
- loongson and octeon now have a randomized statclock(), stathz = hz.

With input from miod@, visa@. Tested by miod@, visa@.

Link: https://marc.info/?l=openbsd-tech&m=166776379603497&w=2

ok visa@ mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.139 22-Aug-2022 cheloha

mips64, octeon, loonson: trigger deferred clock interrupts from splx(9)

As with powerpc, powerpc64, and riscv64, on mips64 platforms we need
to isolate the clock interrupt schedule from the MD clock interrupt
code. To do this, we need to stop deferring clock interrupt work
until the next tick and instead defer the work until we logically
unmask the clock interrupt from splx(9).

Add a boolean (ci_clock_deferred) to the cpu_info struct to note
whether we need to trigger the clock interrupt by hand, and then
do so from splx(9) by calling md_triggerclock().

Currently md_triggerclock is only ever set to cp0_trigger_int5(). The
routine takes great care to ensure that INT5 has fired or will fire
before returning.

There are some loongson machines that use glxclk instead of CP0. They
can be switched to use CP0 later.

With input and advice from visa@ and miod@.

Compiled and extensively tested by visa@ and miod@ on various octeon
and loongson machines. No issues seen on octeon machines. miod@ saw
some odd things on loongsoon, but suggests that all issues are
probably unrelated to this patch.

Link: https://marc.info/?l=openbsd-tech&m=165929192702632&w=2

ok visa@, miod@


Revision tags: OPENBSD_7_1_BASE
# 1.138 28-Jan-2022 visa

Remove unused guarded read and write routines.

No objection from miod@


# 1.137 07-Oct-2021 visa

Remove unused TLB routines.


Revision tags: OPENBSD_7_0_BASE
# 1.136 24-Jul-2021 visa

Replace cpus_running with CPU_IS_RUNNING().


# 1.135 06-Jul-2021 kettenis

Introduce CPU_IS_RUNNING() and us it in scheduler-related code to prevent
waiting on CPUs that didn't spin up. This will allow us to spin down
CPUs in the future to save power as well.

ok mpi@


# 1.134 02-Jun-2021 cheloha

kernel: introduce per-CPU panic(9) message buffers

Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each
platform for use by panic(9). The first panic on a given CPU writes
its message to this buffer. Subsequent panics on a given CPU print
the panic message to the console but do not modify the buffer. This
aids debugging in two cases:

- If 2+ CPUs panic simultaneously there is no risk of garbled messages
in the panic buffer.

- If a CPU panics and then the operator causes a second panic while
using ddb(4), the operator can still recall the first failure on
a particular CPU.

Misc. changes to support this bigger change:

- Set panicstr atomically to identify the first CPU to reach panic().

- Tweak db_show_panic_cmd() to print all panic messages across all
CPUs. Prefix the first panic with an asterisk ('*').

- Prefer db_printf() to printf() during a panic if we have it.
Apparently it disturbs less global state.

- On amd64, tweak fault() to write the local panic buffer. This needs
more work.

Prompted by bluhm@ and deraadt@. Mostly written by deraadt@.
Discussed with bluhm@, deraadt@ and kettenis@.

Borne from a discussion on tech@ about making panic(9) more MP-safe:

https://marc.info/?l=openbsd-tech&m=162086462316143&w=2

ok kettenis@, visa@, bluhm@, deraadt@


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.140 19-Nov-2022 cheloha

mips64, loongson, octeon: switch to clockintr

- Remove mips64-specific clock interrupt scheduling bits from cpu_info.
- Add missing tick_nsec initialization to cpu_initclocks().
- Disable the glxclk interrupt clock on loongson. visa@/miod@ say it
can be removed later if it isn't useful for anything else.
- Wire up cp0_intrclock.

Notes:

- The loongson apm_suspend() changes are untested, but deraadt@ claims
APM suspend/resume on loongson doesn't work anyway.
- loongson and octeon now have a randomized statclock(), stathz = hz.

With input from miod@, visa@. Tested by miod@, visa@.

Link: https://marc.info/?l=openbsd-tech&m=166776379603497&w=2

ok visa@ mlarkin@


Revision tags: OPENBSD_7_2_BASE
# 1.139 22-Aug-2022 cheloha

mips64, octeon, loonson: trigger deferred clock interrupts from splx(9)

As with powerpc, powerpc64, and riscv64, on mips64 platforms we need
to isolate the clock interrupt schedule from the MD clock interrupt
code. To do this, we need to stop deferring clock interrupt work
until the next tick and instead defer the work until we logically
unmask the clock interrupt from splx(9).

Add a boolean (ci_clock_deferred) to the cpu_info struct to note
whether we need to trigger the clock interrupt by hand, and then
do so from splx(9) by calling md_triggerclock().

Currently md_triggerclock is only ever set to cp0_trigger_int5(). The
routine takes great care to ensure that INT5 has fired or will fire
before returning.

There are some loongson machines that use glxclk instead of CP0. They
can be switched to use CP0 later.

With input and advice from visa@ and miod@.

Compiled and extensively tested by visa@ and miod@ on various octeon
and loongson machines. No issues seen on octeon machines. miod@ saw
some odd things on loongsoon, but suggests that all issues are
probably unrelated to this patch.

Link: https://marc.info/?l=openbsd-tech&m=165929192702632&w=2

ok visa@, miod@


Revision tags: OPENBSD_7_1_BASE
# 1.138 28-Jan-2022 visa

Remove unused guarded read and write routines.

No objection from miod@


# 1.137 07-Oct-2021 visa

Remove unused TLB routines.


Revision tags: OPENBSD_7_0_BASE
# 1.136 24-Jul-2021 visa

Replace cpus_running with CPU_IS_RUNNING().


# 1.135 06-Jul-2021 kettenis

Introduce CPU_IS_RUNNING() and us it in scheduler-related code to prevent
waiting on CPUs that didn't spin up. This will allow us to spin down
CPUs in the future to save power as well.

ok mpi@


# 1.134 02-Jun-2021 cheloha

kernel: introduce per-CPU panic(9) message buffers

Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each
platform for use by panic(9). The first panic on a given CPU writes
its message to this buffer. Subsequent panics on a given CPU print
the panic message to the console but do not modify the buffer. This
aids debugging in two cases:

- If 2+ CPUs panic simultaneously there is no risk of garbled messages
in the panic buffer.

- If a CPU panics and then the operator causes a second panic while
using ddb(4), the operator can still recall the first failure on
a particular CPU.

Misc. changes to support this bigger change:

- Set panicstr atomically to identify the first CPU to reach panic().

- Tweak db_show_panic_cmd() to print all panic messages across all
CPUs. Prefix the first panic with an asterisk ('*').

- Prefer db_printf() to printf() during a panic if we have it.
Apparently it disturbs less global state.

- On amd64, tweak fault() to write the local panic buffer. This needs
more work.

Prompted by bluhm@ and deraadt@. Mostly written by deraadt@.
Discussed with bluhm@, deraadt@ and kettenis@.

Borne from a discussion on tech@ about making panic(9) more MP-safe:

https://marc.info/?l=openbsd-tech&m=162086462316143&w=2

ok kettenis@, visa@, bluhm@, deraadt@


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.139 22-Aug-2022 cheloha

mips64, octeon, loonson: trigger deferred clock interrupts from splx(9)

As with powerpc, powerpc64, and riscv64, on mips64 platforms we need
to isolate the clock interrupt schedule from the MD clock interrupt
code. To do this, we need to stop deferring clock interrupt work
until the next tick and instead defer the work until we logically
unmask the clock interrupt from splx(9).

Add a boolean (ci_clock_deferred) to the cpu_info struct to note
whether we need to trigger the clock interrupt by hand, and then
do so from splx(9) by calling md_triggerclock().

Currently md_triggerclock is only ever set to cp0_trigger_int5(). The
routine takes great care to ensure that INT5 has fired or will fire
before returning.

There are some loongson machines that use glxclk instead of CP0. They
can be switched to use CP0 later.

With input and advice from visa@ and miod@.

Compiled and extensively tested by visa@ and miod@ on various octeon
and loongson machines. No issues seen on octeon machines. miod@ saw
some odd things on loongsoon, but suggests that all issues are
probably unrelated to this patch.

Link: https://marc.info/?l=openbsd-tech&m=165929192702632&w=2

ok visa@, miod@


Revision tags: OPENBSD_7_1_BASE
# 1.138 28-Jan-2022 visa

Remove unused guarded read and write routines.

No objection from miod@


# 1.137 07-Oct-2021 visa

Remove unused TLB routines.


Revision tags: OPENBSD_7_0_BASE
# 1.136 24-Jul-2021 visa

Replace cpus_running with CPU_IS_RUNNING().


# 1.135 06-Jul-2021 kettenis

Introduce CPU_IS_RUNNING() and us it in scheduler-related code to prevent
waiting on CPUs that didn't spin up. This will allow us to spin down
CPUs in the future to save power as well.

ok mpi@


# 1.134 02-Jun-2021 cheloha

kernel: introduce per-CPU panic(9) message buffers

Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each
platform for use by panic(9). The first panic on a given CPU writes
its message to this buffer. Subsequent panics on a given CPU print
the panic message to the console but do not modify the buffer. This
aids debugging in two cases:

- If 2+ CPUs panic simultaneously there is no risk of garbled messages
in the panic buffer.

- If a CPU panics and then the operator causes a second panic while
using ddb(4), the operator can still recall the first failure on
a particular CPU.

Misc. changes to support this bigger change:

- Set panicstr atomically to identify the first CPU to reach panic().

- Tweak db_show_panic_cmd() to print all panic messages across all
CPUs. Prefix the first panic with an asterisk ('*').

- Prefer db_printf() to printf() during a panic if we have it.
Apparently it disturbs less global state.

- On amd64, tweak fault() to write the local panic buffer. This needs
more work.

Prompted by bluhm@ and deraadt@. Mostly written by deraadt@.
Discussed with bluhm@, deraadt@ and kettenis@.

Borne from a discussion on tech@ about making panic(9) more MP-safe:

https://marc.info/?l=openbsd-tech&m=162086462316143&w=2

ok kettenis@, visa@, bluhm@, deraadt@


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.138 28-Jan-2022 visa

Remove unused guarded read and write routines.

No objection from miod@


# 1.137 07-Oct-2021 visa

Remove unused TLB routines.


Revision tags: OPENBSD_7_0_BASE
# 1.136 24-Jul-2021 visa

Replace cpus_running with CPU_IS_RUNNING().


# 1.135 06-Jul-2021 kettenis

Introduce CPU_IS_RUNNING() and us it in scheduler-related code to prevent
waiting on CPUs that didn't spin up. This will allow us to spin down
CPUs in the future to save power as well.

ok mpi@


# 1.134 02-Jun-2021 cheloha

kernel: introduce per-CPU panic(9) message buffers

Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each
platform for use by panic(9). The first panic on a given CPU writes
its message to this buffer. Subsequent panics on a given CPU print
the panic message to the console but do not modify the buffer. This
aids debugging in two cases:

- If 2+ CPUs panic simultaneously there is no risk of garbled messages
in the panic buffer.

- If a CPU panics and then the operator causes a second panic while
using ddb(4), the operator can still recall the first failure on
a particular CPU.

Misc. changes to support this bigger change:

- Set panicstr atomically to identify the first CPU to reach panic().

- Tweak db_show_panic_cmd() to print all panic messages across all
CPUs. Prefix the first panic with an asterisk ('*').

- Prefer db_printf() to printf() during a panic if we have it.
Apparently it disturbs less global state.

- On amd64, tweak fault() to write the local panic buffer. This needs
more work.

Prompted by bluhm@ and deraadt@. Mostly written by deraadt@.
Discussed with bluhm@, deraadt@ and kettenis@.

Borne from a discussion on tech@ about making panic(9) more MP-safe:

https://marc.info/?l=openbsd-tech&m=162086462316143&w=2

ok kettenis@, visa@, bluhm@, deraadt@


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.137 07-Oct-2021 visa

Remove unused TLB routines.


Revision tags: OPENBSD_7_0_BASE
# 1.136 24-Jul-2021 visa

Replace cpus_running with CPU_IS_RUNNING().


# 1.135 06-Jul-2021 kettenis

Introduce CPU_IS_RUNNING() and us it in scheduler-related code to prevent
waiting on CPUs that didn't spin up. This will allow us to spin down
CPUs in the future to save power as well.

ok mpi@


# 1.134 02-Jun-2021 cheloha

kernel: introduce per-CPU panic(9) message buffers

Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each
platform for use by panic(9). The first panic on a given CPU writes
its message to this buffer. Subsequent panics on a given CPU print
the panic message to the console but do not modify the buffer. This
aids debugging in two cases:

- If 2+ CPUs panic simultaneously there is no risk of garbled messages
in the panic buffer.

- If a CPU panics and then the operator causes a second panic while
using ddb(4), the operator can still recall the first failure on
a particular CPU.

Misc. changes to support this bigger change:

- Set panicstr atomically to identify the first CPU to reach panic().

- Tweak db_show_panic_cmd() to print all panic messages across all
CPUs. Prefix the first panic with an asterisk ('*').

- Prefer db_printf() to printf() during a panic if we have it.
Apparently it disturbs less global state.

- On amd64, tweak fault() to write the local panic buffer. This needs
more work.

Prompted by bluhm@ and deraadt@. Mostly written by deraadt@.
Discussed with bluhm@, deraadt@ and kettenis@.

Borne from a discussion on tech@ about making panic(9) more MP-safe:

https://marc.info/?l=openbsd-tech&m=162086462316143&w=2

ok kettenis@, visa@, bluhm@, deraadt@


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.136 24-Jul-2021 visa

Replace cpus_running with CPU_IS_RUNNING().


# 1.135 06-Jul-2021 kettenis

Introduce CPU_IS_RUNNING() and us it in scheduler-related code to prevent
waiting on CPUs that didn't spin up. This will allow us to spin down
CPUs in the future to save power as well.

ok mpi@


# 1.134 02-Jun-2021 cheloha

kernel: introduce per-CPU panic(9) message buffers

Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each
platform for use by panic(9). The first panic on a given CPU writes
its message to this buffer. Subsequent panics on a given CPU print
the panic message to the console but do not modify the buffer. This
aids debugging in two cases:

- If 2+ CPUs panic simultaneously there is no risk of garbled messages
in the panic buffer.

- If a CPU panics and then the operator causes a second panic while
using ddb(4), the operator can still recall the first failure on
a particular CPU.

Misc. changes to support this bigger change:

- Set panicstr atomically to identify the first CPU to reach panic().

- Tweak db_show_panic_cmd() to print all panic messages across all
CPUs. Prefix the first panic with an asterisk ('*').

- Prefer db_printf() to printf() during a panic if we have it.
Apparently it disturbs less global state.

- On amd64, tweak fault() to write the local panic buffer. This needs
more work.

Prompted by bluhm@ and deraadt@. Mostly written by deraadt@.
Discussed with bluhm@, deraadt@ and kettenis@.

Borne from a discussion on tech@ about making panic(9) more MP-safe:

https://marc.info/?l=openbsd-tech&m=162086462316143&w=2

ok kettenis@, visa@, bluhm@, deraadt@


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.135 06-Jul-2021 kettenis

Introduce CPU_IS_RUNNING() and us it in scheduler-related code to prevent
waiting on CPUs that didn't spin up. This will allow us to spin down
CPUs in the future to save power as well.

ok mpi@


# 1.134 02-Jun-2021 cheloha

kernel: introduce per-CPU panic(9) message buffers

Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each
platform for use by panic(9). The first panic on a given CPU writes
its message to this buffer. Subsequent panics on a given CPU print
the panic message to the console but do not modify the buffer. This
aids debugging in two cases:

- If 2+ CPUs panic simultaneously there is no risk of garbled messages
in the panic buffer.

- If a CPU panics and then the operator causes a second panic while
using ddb(4), the operator can still recall the first failure on
a particular CPU.

Misc. changes to support this bigger change:

- Set panicstr atomically to identify the first CPU to reach panic().

- Tweak db_show_panic_cmd() to print all panic messages across all
CPUs. Prefix the first panic with an asterisk ('*').

- Prefer db_printf() to printf() during a panic if we have it.
Apparently it disturbs less global state.

- On amd64, tweak fault() to write the local panic buffer. This needs
more work.

Prompted by bluhm@ and deraadt@. Mostly written by deraadt@.
Discussed with bluhm@, deraadt@ and kettenis@.

Borne from a discussion on tech@ about making panic(9) more MP-safe:

https://marc.info/?l=openbsd-tech&m=162086462316143&w=2

ok kettenis@, visa@, bluhm@, deraadt@


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.134 02-Jun-2021 cheloha

kernel: introduce per-CPU panic(9) message buffers

Add a 512-byte buffer (ci_panicbuf) to each cpu_info struct on each
platform for use by panic(9). The first panic on a given CPU writes
its message to this buffer. Subsequent panics on a given CPU print
the panic message to the console but do not modify the buffer. This
aids debugging in two cases:

- If 2+ CPUs panic simultaneously there is no risk of garbled messages
in the panic buffer.

- If a CPU panics and then the operator causes a second panic while
using ddb(4), the operator can still recall the first failure on
a particular CPU.

Misc. changes to support this bigger change:

- Set panicstr atomically to identify the first CPU to reach panic().

- Tweak db_show_panic_cmd() to print all panic messages across all
CPUs. Prefix the first panic with an asterisk ('*').

- Prefer db_printf() to printf() during a panic if we have it.
Apparently it disturbs less global state.

- On amd64, tweak fault() to write the local panic buffer. This needs
more work.

Prompted by bluhm@ and deraadt@. Mostly written by deraadt@.
Discussed with bluhm@, deraadt@ and kettenis@.

Borne from a discussion on tech@ about making panic(9) more MP-safe:

https://marc.info/?l=openbsd-tech&m=162086462316143&w=2

ok kettenis@, visa@, bluhm@, deraadt@


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.133 28-May-2021 visa

Remove CPU and node id fields that were used with SGI Origin.


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.132 05-May-2021 visa

Remove unneeded tlb_set_gbase() that was used with R8000.

Pointed out by miod@


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.131 01-May-2021 visa

Retire OpenBSD/sgi.

OK deraadt@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.130 11-Jul-2020 visa

Synchronize each core's CP0 cycle counter using the IO clock counter.
This makes the cycle counter usable as timecounter on multiprocessor
machines.

Idea from Linux.

Tested on CN5020, CN6120, CN7130 and CN7360.

Looks reasonable to kettenis@


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.129 31-May-2020 dlg

introduce "cpu_rnd_messybits" for use instead of nanotime in dev/rnd.c.

rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.

there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.

so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.

djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.128 02-Sep-2019 deraadt

in non-MP, cpu_number() the #define should be 0UL; ok visa


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.127 05-May-2019 visa

Turn need_resched() and signotify() into proper functions on mips64.


Revision tags: OPENBSD_6_5_BASE
# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.126 05-Dec-2018 jsg

Include srp.h where struct cpu_info uses srp to avoid erroring out when
including cpu.h machine/intr.h etc without first including param.h when
MULTIPROCESSOR is defined.

ok visa@


# 1.125 04-Dec-2018 visa

Add processor IDs for several OCTEON II and III SoCs.


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.124 24-Feb-2018 visa

Declare ci_ipl volatile to prevent the compiler from optimizing
or reordering accesses to the variable. Assume that the assembler
preserves the correct sequence of instructions, which allows the
removal of the explicit noreorder/reorder toggles from the C code.

With ci_ipl being volatile, drop mips_sync() calls that follow
the accesses of the variable. The sync is redundant as a compiler
barrier. In addition, the MIPS64 CPU designs should not need the
sync for pipeline or write buffer control. According to miod@,
the use of the instruction is a carryover from code targeting
early MIPS designs that lack tight integration with the cache
and write buffer.

Discussed with and testing help from miod@.
Tested on CN5020, CN6120, CN7130, CN7360, Loongson 2F and 3A1000,
R4400, R8000, R10000 and R16000.


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.123 29-Jan-2018 visa

Drop unused field `ci_ipiih'.


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64


# 1.122 21-Oct-2017 visa

Use MI mplock on mips64.

OK mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.121 02-Sep-2017 visa

Let the kernel utilize the FPU if one is available, even when the
FPUEMUL option is enabled. This benefits OCTEON III systems which can
run floating-point operations natively.

Feedback from and OK miod@; he also helped with testing.

Tested on octeon without FPU (CN5020, CN6120) and with FPU (CN7130),
as well as on sgi/IP27 (MP R16000), sgi/IP32 (R5000), and
loongson (3A1000).


# 1.120 30-Jul-2017 visa

Define MAXCPUS per mips64 port.


# 1.119 12-Jul-2017 natano

remove CPU_LIDSUSPEND/machdep.lidsuspend

"fire away!" tedu


# 1.118 11-Jun-2017 visa

Fix TLB size computation on OCTEON II and III. The CPUs have utilized
the whole TLB space even before this. However, TLB initialization on
boot and TLB flush on ASID wraparound have been incomplete. These have
caused crashes of processes.


# 1.117 24-May-2017 visa

Add an idle cycle implementation for R4600/R5000/RM7000 CPUs and their
derivatives. This lets the kernel utilize the CPUs' Standby Mode to
reduce the power consumption of an idle system.

Suggested by and input from miod@.
He also tested this patch on an RM7000 O2.


# 1.116 20-Apr-2017 visa

Make TCB address available to userspace via the UserLocal register.
This lets programs get the address without a system call on OCTEON II
and later.

Add UserLocal load emulation for systems that do not implement
the RDHWR instruction or the UserLocal register.

OK guenther@


# 1.115 07-Apr-2017 visa

Add prid for CN72xx/CN73xx.


Revision tags: OPENBSD_6_1_BASE
# 1.114 02-Mar-2017 natano

Add a new sysctl machdep.lidaction. The sysctl works as follows:

machdep.lidaction=0 # do nothing
machdep.lidaction=1 # suspend
machdep.lidaction=2 # hibernate

lidsuspend is just an alias for lidaction, so if you change one, the
other one will have the same value. The plan is to remove
machdep.lidsuspend eventually when people have upgraded their
/ets/sysctl.conf.

discussed with deraadt, who came up with the new MIB name
no objections mlarkin
ok stsp halex jcs


# 1.113 17-Dec-2016 visa

Make Octeon model strings a bit more specific. While there,
add CN70xx/CN71xx.


# 1.112 16-Dec-2016 fcambus

Provide the "machdep.lidsuspend" sysctl on Loongson.

OK visa@


# 1.111 14-Aug-2016 visa

Utilize the TLB Execute-Inhibit bit with non-executable mappings on CPUs
that support the Execute-Inhibit exception. This makes user space W^X
effective on Octeon Plus and later Octeon versions.

Feedback from miod@, thanks!
No objection from deraadt@


Revision tags: OPENBSD_6_0_BASE
# 1.110 06-Mar-2016 mpi

Rename mips64's trap_frame into trapframe.

For coherency with other archs and in order to use it in MI code.

ok visa@, tobiasu@


# 1.109 01-Mar-2016 mmcc

guard macro args with parens

from Michal Mazurek, ok deraadt@


Revision tags: OPENBSD_5_9_BASE
# 1.108 05-Jan-2016 visa

Some implementations of HitSyncDCache() call pmap_extract() for va->pa
conversion. Because pmap_extract() acquires the PTE mutex, a "locking
against myself" panic is triggered if the cache routine gets called in
a context where the mutex is already held.

In the pmap, all calls to HitSyncDCache() are for a whole page. Add a
new cache routine, HitSyncDCachePage(), which gets both the va and the
pa of a page. This removes the need of the va->pa conversion. The new
routine has the same signature as SyncDCachePage(), allowing reuse of
the same routine for cache implementations that do not need differences
between "Hit" and non-"Hit" routines.

With the diff, POWER Indigo2 R8000 boots multiuser again. Tested on sgi
GENERIC-IP27.MP and octeon GENERIC.MP, too.

Diff from miod@, ok kettenis@


# 1.107 25-Dec-2015 visa

Make interrupt masking MP-aware. Linux IP27 and IP35 ports served as a
substitute for hardware documentation.


# 1.106 23-Sep-2015 miod

That PICA reference ought to have been removed 20 years ago!


Revision tags: OPENBSD_5_8_BASE
# 1.105 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_5_7_BASE
# 1.104 11-Feb-2015 dlg

no md code wants lockmgr locks, so no md code needs to include sys/lock.h

with and ok miod@


# 1.103 14-Aug-2014 tobias

fixed overrid(d)en typo

millert@ and jmc@ agree that "overriden" is wrong


Revision tags: OPENBSD_5_6_BASE
# 1.102 11-Jul-2014 uebayasi

CPU_BUSY_CYCLE(): A new MI statement for busy loop power reduction

The new CPU_BUSY_CYCLE() may be put in a busy loop body so that CPU can reduce
power consumption, as Linux's cpu_relax() and FreeBSD's cpu_spinwait(). To
start minimally, use PAUSE on i386/amd64 and empty on others. The name is
chosen following the existing cpu_idle_*() functions. Naming and API may be
polished later.

OK kettenis@


# 1.101 04-Apr-2014 miod

Second step of the R4000 EOP errata WAR: when pmap invalidates a page which
is currently being covered by the wired TLB entries, flush them, so that,
if the process' pc is still running in a vulnerable page, the WAR will
reapply immediately and fault the next page.


# 1.100 31-Mar-2014 miod

Due the virtually indexed nature of the L1 instruction cache on most mips
processors, every time a new text page is mapped in a pmap, the L1 I$ is
flushed for the va spanned by this page.

Since we map pages of our binaries upon demand, as they get faulted in, but
uvm_fault() tries to map the few neighbour pages, this can end up in a
bunch of pmap_enter() calls in a row, for executable mappings. If the L1
I$ is small enough, this can cause the whole L1 I$ cache to be flushed
several times.

Change pmap_enter() to postpone these flushes by only registering the
pending flushes, and have pmap_update() perform them. The cpu-specific
cache code can then optimize this to avoid unnecessary operations.

Tested on R4000SC, R4600SC, R5000SC, RM7000, R10000 with 4KB and 16KB
page sizes (coherent and non-coherent designs), and Loongson 2F by mikeb@ and
me. Should not affect anything on Octeon since there is no way to flush a
subset of I$ anyway.


# 1.99 29-Mar-2014 guenther

It's been a quarter century: we can assume volatile is present with that name.

ok dlg@ mpi@ deraadt@


# 1.98 22-Mar-2014 miod

Second draft of my attempt to workaround the infamous R4000 end-of-page errata,
affecting R4000 processors revision 2.x and below (found on most R4000 Indigo
and a few R4000 Indy).

Since this errata gets triggered by TLB misses when the code flow crosses a
page boundary, this code attempts to identify code pages prone to trigger the
errata, and force the next page to be mapped for at least as long as the
current pc lies in the troublesome page, by creating wiring extra TLB entries.
These entries get recycled in a lazy-but-aggressive-enough way, either because
of context switches, or because of further tlb exceptions reaching trap().

The errata workaround code is only compiled on R4000-capable kernels (i.e.
sgi GENERIC-IP22 and nothing else), and only enabled on affected processors
(i.e. not on R4000 revision 3, or on R4400).

There is still room for improvemnt in unlucky cases, but in this simple enough
incarnation, this allows my R4000 2.2 Indigo to finally reliably boot multiuser,
even though both /sbin/init and /bin/sh contain code pages which can trigger
the errata.


# 1.97 21-Mar-2014 miod

Rename db_inst_type() into classify_insn() and make that function available
outside of ddb. It will be used by regular kernel code shortly.


# 1.96 09-Mar-2014 miod

Rework the per-cpu cache information. Use a common struct to store the line
size, the number of sets, and the total size (and the set size, for convenience)
per cache (I$, D$, L2, L3).
This allows cpu.c to print the number of ways (sets) of L2 and L3 caches from
the cache information, rather than hardcoding this from the processor type.


Revision tags: OPENBSD_5_5_BASE
# 1.95 19-Dec-2013 jasper

recognize octeon 2 cpus; as found in the lanner mr326

ok miod@


Revision tags: OPENBSD_5_4_BASE
# 1.94 12-Mar-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffers and teach
kgmon(8) to deal with them, this time without public header changes.

Previously various CPUs were iterating over the same global buffer at
the same time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok deraadt@, mikeb@, haesbaert@


Revision tags: OPENBSD_5_3_BASE
# 1.93 12-Feb-2013 mpi

Back out per-CPU kernel profiling, it shouldn't modify a public header
at this moment.


# 1.92 11-Feb-2013 mpi

Fix kernel profiling on MP systems by using per-CPU buffer. Previously
various CPUs were iterating over the same global buffer at the same
time to modify it and never ended.

This diff includes some ideas submited by Thor Simon to NetBSD via miod@.

ok mikeb@, haesbaert@


# 1.91 02-Dec-2012 guenther

Determine whether we're currently on the alternative signal stack
dynamically, by comparing the stack pointer against the altstack
base and size, so that you get the correct answer if you longjmp
out of the signal handler, as tested by regress/sys/kern/stackjmp/.
Also, fix alt stack handling on vax, where it was completely broken.

Testing and corrections by miod@, krw@, tobiasu@, pirofti@


# 1.90 03-Oct-2012 miod

Split ever-growing mips <machine/cpu.h> into what 99% of the kernel needs,
which will remain in <machine/cpu.h>, and a new mips_cpu.h containing only the
goriest md details, which are only of interest to a handful set of files; this
is similar in spirit to what alpha does, but here <machine/cpu.h> does not
include the new file.


# 1.89 29-Sep-2012 miod

Basic R8000 processor support. R8000 processors require MMU-specific code,
exception-specific code, clock-specific code, and L1 cache-specific code. L2
cache is per-design, of which only two exist: SGI Power Indigo2 (IP26) and SGI
Power Challenge (IP21) and are not covered by this commit.

R8000 processors also are 64-bit only processors with 64-bit coprocessor 0
registers, and lack so-called ``compatibility'' memory spaces allowing 32-bit
code to run with sign-extended addresses and registers.

The intrusive changes are covered by #ifdef CPU_R8000 stanzas. However,
trap() is split into a high-level wrapper and a new function, itsa(),
responsible for the actual trap servicing (which name couldn't be helped
because I'm an incorrigible punster). While an R8000 exception may cause
(via trap() ) multiple exceptions to be serviced, non-R8000 processors will
always service one exception in trap(), but they are nevertheless affected
by this code split.


# 1.88 29-Sep-2012 miod

Forgot this in previous commit


# 1.87 29-Sep-2012 miod

Handle the coprocessor 0 cause and status registers as a 64 bit value now,
as some odd mips designs need moro than 32 bits in there. This causes a lot
of mechanical changes everywhere getsr() is used.


# 1.86 29-Sep-2012 miod

Add a few more coprocessor 0 cause and config registers defines.


# 1.85 29-Sep-2012 miod

Kill the mostly unused VMTLB_xxx and VMNUM_xxx defines. Move all tlb
knowledge to <machine/pte.h>. Add specific routines for tlb handling setup
(at cpu initialization time) and tlb ASID wrap.


# 1.84 29-Sep-2012 miod

Proide a mips_sync() macro to wrap asm("sync"), and replace gazillions of
such statements with it.


Revision tags: OPENBSD_5_2_BASE
# 1.83 14-Jul-2012 miod

Split the existing mips64 clock code into time-of-day and generic duties in
machdep.c, and internal clock interrupting on level 5, still in clock.c; this
will allow other clock sources to be used in the near future. (delay() will
remain tied to the internal clock)


# 1.82 24-Jun-2012 miod

Add cache operation functions pointers to struct cpu_info; the various
cache lines and sizes are already there, after all.

The ConfigCache cache routine is responsible for filling these function
pointers; cache routine invocation macros are updated to use the cpu_info
fields, but may still be overriden in <machine/cpu.h> on platforms where
only one set of cache routines is used.


# 1.81 27-May-2012 miod

Add a `L2 cache line size' member to struct cpu_info. This allows R4k code to
stop abusing another field, and will be used by more routines RSN.

No functional change.


# 1.80 19-Apr-2012 miod

Print the currently active ASID in `machine tlb' ddb command.


# 1.79 06-Apr-2012 miod

Make the logic for PMAP_PREFER() and the logic, inside pmap, to do the
necessary cache coherency work wrt similar virtual indexes of different
physical pages, depending upon two distinct global variables, instead of
a shared one. R4000/R4400 VCE requires a 32KB mask for PMAP_PREFER, which
is otherwise not necessary for pmap coherency (especially since, on these
processors, only L1 uses virtual indexes, and the L1 size is not greater
than the page size, as we are using 16KB pages).


# 1.78 28-Mar-2012 miod

Work in progress support for the SGI Indigo, Indigo 2 and Indy systems
(IP20, IP22, IP24) in 64-bit mode, adapated from NetBSD. Currently limited
to headless operation, input and video drivers will get ported soon.

Should work on all R4000, R4440 and R5000 based systems. L2 cache on R5000SC
Indy not supported yet (coming soon), R4600 not supported yet either (coming
soon as well).

Tested to boot multiuser on: Indigo2 R4000SC, Indy R4000PC, Indy R4000SC,
Indy R5000SC, Indigo2 R4400SC. There are still glitches in the Ethernet driver
which are being looked at.

Expansion support is limited to the GIO E++ board; GIO boards with PCI-GIO
bridges not ported yet due to the lack of hardware, and this kind of driver
does not port blindly.

Most of this work comes from NetBSD, polishing and integration work, as well
as putting as many ``R4x00 in 64-bit mode'' erratas as necessary, by yours
truly.

More work is coming, as well as trying to get some easy way to boot install
kernels (as older PROM can only boot ECOFF binaries, which won't do for the
kernel).


# 1.77 25-Mar-2012 miod

Move cache handling routines related definitions to a dedicated header file,
rather than abusing <machine/cpu.h>.


# 1.76 24-Mar-2012 miod

The various ConfigCache() functions actually return void, not int.


# 1.75 24-Mar-2012 miod

Add a few trivial routines to get mips64r2 specific config registers. Not used
by anything yet, but has been lying in one of my trees for too long.


# 1.74 19-Mar-2012 miod

Use uncached addresses for all exception vectors, when copying our code (or
trampolines) to them; this makes sure there is no risk of pending writes
being lost when we clear the caches. Of course, this would be a bug in the
cache handling routines, but having our vectors correctly set will help
debugging the issue.
Tested on sgi and loongson.


# 1.73 15-Mar-2012 miod

uncached_base was introduced early in IP27 support, since these designs use
subspaces in the CCA_NC uncached memory space. However, being coherent,
there was never a need for bus_dma to use uncached addresses.

This means that, on the only systems where uncached_base was not set to
PHYS_TO_XKPHYS(0, CCA_NC), it was never used.

Remove the variable, and replace PHYS_TO_UNCACHED() with
PHYS_TO_XKPHYS(, CCA_NC). No functional change.


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 24-Jun-2011 naddy

machdep.kbdreset enables a shutdown by Ctrl-Alt-Del on amd64 and
i386. Stop abusing it on other archs for controling a shutdown by
pressing the soft power button:

* Add a MI sysctl hw.allowpowerdown; if set to 1 (the default) it
allows a power button shutdown.
* Make acpi(4)/acpibtn(4) honor hw.allowpowerdown.
* Switch the various power button intercepts on landisk, sgi, sparc64
and zaurus over to hw.allowpowerdown.
* Garbage collect the machdep.kbdreset sysctl on all archs other than
amd64 and i386.

ok miod@


# 1.71 31-Mar-2011 miod

Recognize Loongson 3A processors, but don't accept to run on them yet, the
cache routines are not ready. This is mostly low-hanging fruit.


# 1.70 23-Mar-2011 pirofti

Normalize sentinel. Use _MACHINE_*_H_ and _<ARCH>_*_H_ properly and consitently.

Discussed and okay drahn@. Okay deraadt@.


Revision tags: OPENBSD_4_9_BASE
# 1.69 24-Nov-2010 miod

Floating-point emulation code for systems lacking proper FPU (i.e. Octeon),
enabled by option FPUEMUL.

This is pretty straightforward, except for conditional branch on FPU condition
codes emulation (bc1f/bc1fl/bc1t/bc1tl instructions): unlike most
RISC-with-delay-slots designs (m88k, sparc), the branch pipeline is not exposed
to the kernel on Mips, therefore we can not resume a branch without losing the
delay slot instruction.

Some other operating systems work around this issue by emulating the delay
slot instruction, but this is error-prone (and requires the kernel code to
be aware of all supported instructions of the processor it is currently running
on), some use dedicated breakpoints to single-step through the delay slot and
then resume the branch as expected, but this causes a lot of copy-on-write
allocations.

This code chooses a third path, of copying the delay slot instructions to run toa special `magic' page, followed by a special trap instruction to give control
back to the kernel. This makes sure the instruction will actually be run by the
processor, and that no more than one page per process is wasted, regardless of
the number of branches to emulate.

Tested on octeon (big-endian) by syuu@ and on loongson (little-endian) by me.
Note that enabling option FPUEMUL in the kernel will completely disable the
hardware FPU, if there is one; there is currently no way to build a kernel
supporting both hardware and software FPU, and there is no reason to change
this until there is a strong need to support both.


# 1.68 24-Oct-2010 miod

Move build_trampoline() and setregs() to a common location for all mips ports.


# 1.67 02-Oct-2010 syuu

Added octeon specific cop0 registers. ok miod@


# 1.66 28-Sep-2010 miod

Implement a per-cpu held mutex counter if DIAGNOSTIC on all non-x86 platforms,
to complete matthew@'s commit of a few days ago, and drop __HAVE_CPU_MUTEX_LEVEL
define. With help from, and ok deraadt@.


# 1.65 21-Sep-2010 miod

Replace the old floating point completion code with a C interface to the
MI softfloat code, implementing all MIPS IV specified floating point
operations.
Tested on R5000, R10000, R14000 and Loongson2F.


# 1.64 20-Sep-2010 syuu

cache operations for octeon. ok miod@


# 1.63 17-Sep-2010 miod

Protect a few more defines with _KERNEL checks, and also allow some of them
to be visible if _STANDALONE. This will eventually be used by the upcoming
new-and-improved loongson bootblocks (in the works).


# 1.62 13-Sep-2010 syuu

Added OCTEON in cpu type. ok miod@


# 1.61 12-Sep-2010 miod

Stricter types in MipsEmulateBranch(), and related cleanups.
No functional change.


# 1.60 11-Sep-2010 syuu

move machine dependent GET_CPU_INFO(), getcurcpu(), setcurcpu() to arch/sgi. ok miod@


# 1.59 30-Aug-2010 syuu

ddbcpu for sgi. ok miod@


Revision tags: OPENBSD_4_8_BASE
# 1.58 28-Apr-2010 syuu

Storeing current cpu_info address into LLAddr register, for curcpu().
Instead of previous implementation, we won't use physical cpuid to fetch curcpu().
This requires to implement IP27/35 SMP.
Implemented getcurcpu() and setcurcpu() for it, smp_malloc() renamed alloc_contiguous_pages() because now it only allocate by page.
ok miod@


Revision tags: OPENBSD_4_7_BASE
# 1.57 28-Feb-2010 miod

Pass L2 cache size in struct cpu_hwinfo, so that bootstrap of secondary
processors can display correct data. Now cpu1 on octane is correctly
reported in dmesg.


# 1.56 28-Feb-2010 miod

Add an explicit `delay constant' member to struct cpu_info, so that it can
be decoupled from the nominal processor speed.
While there, make sure delay() gets a proper delay constant if invoked before
cpu0 attaches (how could I miss that when introducing struct cpu_hwinfo?!?)


# 1.55 18-Jan-2010 miod

Define IPL_SCHED as IPL_CLOCK, not IPL_HIGH.


# 1.54 09-Jan-2010 miod

Make interrupt depth counters per-cpu.


# 1.53 09-Jan-2010 miod

Move cache information from global variables to per-cpu_info fields; this
allows processors with different cache sizes to be used.

Cache management routines now take a struct cpu_info * as first parameter.


# 1.52 09-Jan-2010 miod

Define struct cpu_hwinfo, to hold hardware specific information about each
processor (instead of sys_config.cpu[]), and pass it in the attach_args
when attaching cpu devices.

This allows per-cpu information to be gathered late in the bootstrap process,
and not be limited by an arbitrary MAX_CPUS limit; this will suit IP27 and
IP35 systems better.

While there, use this information to make sure delay() uses the speed
information from the cpu it is invoked on.


# 1.51 08-Jan-2010 syuu

MP-safe FPU handling. ok miod@


# 1.50 30-Dec-2009 syuu

curcpu()->ci_curpmap added. ok miod@


# 1.49 28-Dec-2009 syuu

MP-safe pmap implemented, enable IPI in interrupt handler to avoid deadlock.
ok miod@


# 1.48 25-Dec-2009 miod

Pass both the virtual address and the physical address of the memory range
when invoking the cache functions. The physical address is needed when
operating on physically-indexed caches, such as the L2 cache on Loongson
processors.

Preprocessor abuse makes sure that the physical address computation gets
compiled out when running on a kernel compiled for virtually-indexed
caches only, such as the sgi kernel.


# 1.47 07-Dec-2009 miod

Support for 16KB page size kernels; page size is now set in <machine/param.h>
rather than <mips64/param.h>.

For now, kernels are kept at 4KB to give people some time to build 16KB
compatible binaries; this will change before the end of this release cycle.

Use of 16KB page size kernels yields a 18% speedup (which, offset by the
1.6% slowdown caused by the pmap changes, yields a 16.6% overall speedup).


# 1.46 25-Nov-2009 syuu

IP30 IPI implementation.
Also few xheart modification for SMP.
ok miod@


# 1.45 24-Nov-2009 syuu

smp_malloc() implemented.
This function allocates memory using malloc or uvm_pglistalloc, then returns XKPHYS address of allocated memory.
It's for avoid using virtual address on secondary cpus in early stage, and also in TLB handler.
ok miod@


# 1.44 22-Nov-2009 syuu

SMP support on MIPS clock.
ok miod@


# 1.43 19-Nov-2009 miod

Rename KSEG* defines to CKSEG* to match their names in 64 bit mode; also
define more 64 bit spaces.


# 1.42 30-Oct-2009 syuu

Support IP30 secondary cpu bootup. ok miod@


# 1.41 22-Oct-2009 miod

Completely overhaul interrupt handling on sgi. Cpu state now only stores a
logical IPL level, and per-platform (IP27/IP30/IP32) code will from the
necessary hardware mask registers.

This allows the use of more than one interrupt mask register. Also, the
generic (platform independent) interrupt code shrinks a lot, and the actual
interrupt handler chains and masking information is now per-platform private
data.

Interrupt dispatching is generated from a template; more routines will be
added to the template to reduce platform-specific changes and share as much
code as possible.

Tested on IP27, IP30, IP32 and IP35.


# 1.40 22-Oct-2009 miod

With the splx() changes, it is no longer necessary to remember which interrupt
sources were masked and saved in ci_ipending, as splx() will unmask what needs
to be unmasked anyway. ci_ipending only now needs to store pending soft
interrupts, so rename it to ci_softpending.


# 1.39 22-Oct-2009 miod

Replace intrmask_t with uint32_t. This types only describes interrupt masks
in the coprocessor 0 status register (coupled with ICR on rm7k/rm9k), and
may be completely alien to real hardware interrupt masks, so don't make
things unnecessary confusing.


# 1.38 07-Oct-2009 syuu

ipending, cpl moved into cpu_info
OK miod@


# 1.37 30-Sep-2009 syuu

curproc, curprocpaddr moved into cpu_info
OK miod@


# 1.36 15-Sep-2009 syuu

cpu status flag, cpuid added to cpu_info.
cpu_info pointer array, cpu_info iterator, cpu_number() implementation added.
constraint modifier fixed in lock.h to output correct assembly.
calling proc_trampoline_mp in exception.S.


# 1.35 06-Aug-2009 miod

Make sure <machine/cpu.h> includes <machine/intr.h> when included with _LOCORE
defined; cp0access.S relies on this.


# 1.34 06-Aug-2009 miod

Work in progress support for Loongson2E/2F processors; need option CPU_LOONGSON2
in the kernel to be brought in, due to invasive differences in tlb operation.
Comes with a separate cache operations file due to the cache being R5k-style
with R10k-style way number encoding.


Revision tags: OPENBSD_4_6_BASE
# 1.33 10-Jun-2009 miod

Switch sgi to per-process AST, and move ast() from interrupt.c to trap.c
where it can use userret() instead of duplicating it.


# 1.32 02-Jun-2009 miod

Add an r10k-specific cop0 control register.


# 1.31 22-May-2009 miod

Drop almost unused <machine/psl.h> on sgi; move USERMODE() definition from
there to trap.c which is its only user. This also cleans up multiple
inclusion of <machine/cpu.h> (because <machine/psl.h> includes it) in many
places.


# 1.30 26-Mar-2009 oga

Remove cpu_wait(). It's original use was to be called from the reaper so
MD code would free resources that couldn't be freed until we were no
longer running in that processor. However, it's is unused on all
architectures since mikeb@'s tss changes on x86 earlier in the year.

ok miod@


Revision tags: OPENBSD_4_5_BASE
# 1.29 15-Oct-2008 deraadt

make random(9) return per-cpu values (by saving the seed in the cpuinfo),
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn


# 1.28 10-Oct-2008 art

Add empty cpu_unidle() macros for architectures that currently don't do
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.


# 1.27 10-Oct-2008 art

Define MAXCPUS on all architectures.
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).


# 1.26 09-Oct-2008 art

Implement CPU_INFO_UNIT for everyone, not just MP kernels.
ok miod@


Revision tags: OPENBSD_4_4_BASE
# 1.25 18-Jul-2008 art

Add a macro that clears the want_resched flag that need_resched sets.
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.

Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).

miod@ ok


# 1.24 07-Apr-2008 miod

Add ``guarded'' word read and write routines, to be used by machine-dependent
code soon. Similar to what ddb does, but does not need ddb to be compiled in.


# 1.23 07-Apr-2008 miod

Define more cache coherency attributes, as well as R10k space identifiers.
Define a symbolic ``cached'' attribute, to be used for cached mappings
regardless of the system's cache coherency.


Revision tags: OPENBSD_4_3_BASE
# 1.22 18-Dec-2007 jasper

add power(4), a driver for the power button found on SGI O2's.
when machdep.kbdreset is set, and the correct interrupt is fired,
the machine gets shut down.

with help from and ok jsing@, ok miod@


# 1.21 25-Nov-2007 jmc

spelling fixes, from Martynas Venckus;


Revision tags: OPENBSD_4_2_BASE
# 1.20 18-Jul-2007 miod

bus_dmamem_map() maps with a single segment in directly-translated XKPHYS
space, either cache coherent for regular mappings and uncached for
BUS_DMA_COHERENT mappings, as done on all other platforms with direct mappings.


# 1.19 18-Jun-2007 miod

Use a shorter form to load XKPHYS constants in .S code, shaves a few text
bytes, no functional change.


# 1.18 07-May-2007 kettenis

Move sgo to __HAVE_CPUINFO.

ok miod@


# 1.17 03-May-2007 miod

Enable support for > 512MB of physical memory on mips64 systems, by using
XKPHYS instead of KSEG[01] for direct mappings.

Then, detect memory above 256MB on O2 by poking at the CRIME registers
(ARCbios will not report memory above 256MB, which is mapped above 1GB
physical, to the system), and add it to the UVM managed memory.

Tested on r5k, rm5200 and r10k with and without more than 256MB, matching
hinv reports in all cases. CRIME memory decoding based on a diff from
kettenis@ in december 2005.


# 1.16 10-Apr-2007 miod

Remove long dead definitions. No functional change.


# 1.15 15-Mar-2007 art

Since p_flag is often manipulated in interrupts and without biglock
it's a good idea to use atomic.h operations on it. This mechanic
change updates all bit operations on p_flag to atomic_{set,clear}bits_int.

Only exception is that P_OWEUPC is set by MI code before calling
need_proftick and it's automatically cleared by ADDUPC. There's
no reason for MD handling of that flag since everyone handles it the
same way.

kettenis@ ok


Revision tags: OPENBSD_4_1_BASE
# 1.14 24-Dec-2006 miod

Define PROC_PC. Then, since profiling information is being reported in
statclock(), do not bother doing this in userret() anymore. As a result,
userret() does not need its pc and ticks arguments, simplify.


# 1.13 29-Nov-2006 miod

Remove cpu_swapin() and cpu_swapout(), they are no longer necessary (except
for cpu_swapin() on hppa* which is kept).


Revision tags: OPENBSD_3_9_BASE OPENBSD_4_0_BASE
# 1.12 02-Jan-2006 miod

Kill enablertclock.


Revision tags: OPENBSD_3_8_BASE
# 1.11 07-Aug-2005 miod

Remove advertising clause from UCB licenses; ok deraad@


Revision tags: OPENBSD_3_7_BASE
# 1.10 11-Nov-2004 pefo

say hello to XKSEG0 and XKSEG1!


# 1.9 20-Oct-2004 pefo

Fix some 64 bit address problems.
Some function names made more unique.
Other changes for the upcoming Origin 200 support.


# 1.8 27-Sep-2004 pefo

Rewrite parts of the interrupt system to achive:

o Remove do_pending code and take a real int instead. The performance
impact seems to be very low and it simplifies the code considerably.

o Allow interrupt nesting at first level. Run softints with HW ints
enabled.


# 1.7 21-Sep-2004 miod

Nuke commons.


# 1.6 20-Sep-2004 pefo

Add support for R10K cpu class


Revision tags: OPENBSD_3_6_BASE
# 1.5 09-Sep-2004 pefo

these should have gone in with the other 64 bit changes


# 1.4 15-Aug-2004 pefo

remove LP32 defs not used


# 1.3 10-Aug-2004 deraadt

spacing


# 1.2 09-Aug-2004 pefo

Big cleanup. Removed some unused obsolete stuff and fixed copyrights
on some files. Arcbios support is now in, thus detects memorysize and cpu
clock frequency.


# 1.1 06-Aug-2004 pefo

initial mips64