History log of /freebsd-current/sys/amd64/amd64/cpu_switch.S
Revision Date Author Comments
# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# df013409 03-Oct-2020 Konstantin Belousov <kib@FreeBSD.org>

amd64: Store full 64bit of FIP/FDP for 64bit processes when using XSAVE.

If current process is 64bit, use rex-prefixed version of XSAVE
(XSAVE64). If current process is 32bit and CPU supports saving
segment registers cs/ds in the FPU save area, use non-prefixed variant
of XSAVE.

Reported and tested by: Michał Górny <mgorny@mgorny@moritz.systems>
PR: 250043
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26643


# f446480b 23-Aug-2020 Konstantin Belousov <kib@FreeBSD.org>

amd64: Handle 5-level paging on wakeup.

We can switch into long mode directly with LA57 enabled.

Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D25273


# ea602083 20-May-2020 Konstantin Belousov <kib@FreeBSD.org>

amd64: Add a knob to flush RSB on context switches if machine has SMEP.

The flush is needed to prevent cross-process ret2spec, which is not handled
on kernel entry if IBPB is enabled but SMEP is present.
While there, add i386 RSB flush.

Reported by: Anthony Steinhauser <asteinhauser@google.com>
Reviewed by: markj, Anthony Steinhauser
Discussed with: philip
admbugs: 961
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 98158c75 10-Nov-2019 Konstantin Belousov <kib@FreeBSD.org>

amd64: move common_tss into pcpu.

This saves some memory, around 256K I think. It removes some code,
e.g. KPTI does not need to specially map common_tss anymore. Also,
common_tss become domain-local.

Reviewed by: jhb
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22231


# 5e921ff4 25-Oct-2019 Konstantin Belousov <kib@FreeBSD.org>

amd64: move pcb out of kstack to struct thread.

This saves 320 bytes of the precious stack space.

The only negative aspect of the change I can think of is that the
struct thread increased by 320 bytes obviously, and that 320 bytes are
not swapped out anymore. I believe the freed stack space is much more
important than that. Also, current struct thread size is 1392 bytes
on amd64, so UMA will allocate two thread structures per (4KB) slab,
which leaves a space for pcb without increasing zone memory use.

Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D22138


# 73dd84a7 28-Aug-2019 Mateusz Guzik <mjg@FreeBSD.org>

amd64: clean up cpu_switch.S

- LK macro (conditional on SMP for the lock prefix) is unused
- SETLK unnecessarily performs xchg. obtained value is never used and the
implicit lock prefix adds avoidable cost. Barrier provided by it does
not appear to be of any use.
- the lock waited for is almost never blocked, yet the loop starts with
a pause. Move it out of the common case.

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19563


# a9262f49 16-Mar-2019 Konstantin Belousov <kib@FreeBSD.org>

amd64: rewrite cpu_switch.S fragment to reload tss.rsp0 on context switch.

New code avoids jumps.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19514


# d1a07e31 13-Jun-2018 Konstantin Belousov <kib@FreeBSD.org>

Enable eager FPU context switch by default on amd64.

With compilers making increasing use of vector instructions the
performance benefit of lazily switching FPU state is no longer a
desirable tradeoff. Linux switched to eager FPU context switch some
time ago, and the idea was floated on the FreeBSD-current mailing list
some years ago[1].

Enable eager FPU context switch by default on amd64, with a tunable/sysctl
available to turn it back off.

[1] https://lists.freebsd.org/pipermail/freebsd-current/2015-March/055198.html

Reviewed by: jhb
Tested by: pho
Sponsored by: The FreeBSD Foundation


# 27275f8a 26-Apr-2018 Tycho Nightingale <tychon@FreeBSD.org>

Expand the checks for UCR3 == PMAP_NO_CR3 to enable processes to be
excluded from PTI.

Reviewed by: kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D15100


# fc2a8776 20-Mar-2018 Ed Maste <emaste@FreeBSD.org>

Rename assym.s to assym.inc

assym is only to be included by other .s files, and should never
actually be assembled by itself.

Reviewed by: imp, bdrewery (earlier)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D14180


# bd50262f 17-Jan-2018 Konstantin Belousov <kib@FreeBSD.org>

PTI for amd64.

The implementation of the Kernel Page Table Isolation (KPTI) for
amd64, first version. It provides a workaround for the 'meltdown'
vulnerability. PTI is turned off by default for now, enable with the
loader tunable vm.pmap.pti=1.

The pmap page table is split into kernel-mode table and user-mode
table. Kernel-mode table is identical to the non-PTI table, while
usermode table is obtained from kernel table by leaving userspace
mappings intact, but only leaving the following parts of the kernel
mapped:

kernel text (but not modules text)
PCPU
GDT/IDT/user LDT/task structures
IST stacks for NMI and doublefault handlers.

Kernel switches to user page table before returning to usermode, and
restores full kernel page table on the entry. Initial kernel-mode
stack for PTI trampoline is allocated in PCPU, it is only 16
qwords. Kernel entry trampoline switches page tables. then the
hardware trap frame is copied to the normal kstack, and execution
continues.

IST stacks are kept mapped and no trampoline is needed for
NMI/doublefault, but of course page table switch is performed.

On return to usermode, the trampoline is used again, iret frame is
copied to the trampoline stack, page tables are switched and iretq is
executed. The case of iretq faulting due to the invalid usermode
context is tricky, since the frame for fault is appended to the
trampoline frame. Besides copying the fault frame and original
(corrupted) frame to kstack, the fault frame must be patched to make
it look as if the fault occured on the kstack, see the comment in
doret_iret detection code in trap().

Currently kernel pages which are mapped during trampoline operation
are identical for all pmaps. They are registered using
pmap_pti_add_kva(). Besides initial registrations done during boot,
LDT and non-common TSS segments are registered if user requested their
use. In principle, they can be installed into kernel page table per
pmap with some work. Similarly, PCPU can be hidden from userspace
mapping using trampoline PCPU page, but again I do not see much
benefits besides complexity.

PDPE pages for the kernel half of the user page tables are
pre-allocated during boot because we need to know pml4 entries which
are copied to the top-level paging structure page, in advance on a new
pmap creation. I enforce this to avoid iterating over the all
existing pmaps if a new PDPE page is needed for PTI kernel mappings.
The iteration is a known problematic operation on i386.

The need to flush hidden kernel translations on the switch to user
mode make global tables (PG_G) meaningless and even harming, so PG_G
use is disabled for PTI case. Our existing use of PCID is
incompatible with PTI and is automatically disabled if PTI is
enabled. PCID can be forced on only for developer's benefit.

MCE is known to be broken, it requires IST stack to operate completely
correctly even for non-PTI case, and absolutely needs dedicated IST
stack because MCE delivery while trampoline did not switched from PTI
stack is fatal. The fix is pending.

Reviewed by: markj (partially)
Tested by: pho (previous version)
Discussed with: jeff, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 4275e16f 10-Jan-2018 Konstantin Belousov <kib@FreeBSD.org>

Rename COMMON_TSS_RSP0 to TSS_RSP0.

The symbol is just an offset in the hardware TSS structure, it is not
limited to the common_tss instance.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 843d5752 05-Oct-2017 Konstantin Belousov <kib@FreeBSD.org>

Update comment to note that we skip LDT reload for kthreads as well.

Noted by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 0d4e7ec5 26-Aug-2017 Ryan Libby <rlibby@FreeBSD.org>

amd64: drop q suffix from rd[fg]sbase for gas compatibility

Reviewed by: kib
Approved by: markj (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12133


# 3e902b3d 21-Aug-2017 Konstantin Belousov <kib@FreeBSD.org>

Make WRFSBASE and WRGSBASE instructions functional.

Right now, we enable the CR4.FSGSBASE bit on CPUs which support the
facility (Ivy and later), to allow usermode to read fs and gs bases
without syscalls. This bit also controls the write access to bases
from userspace, but WRFSBASE and WRGSBASE instructions currently
cannot be used, because return path from both exceptions or interrupts
overrides bases with the values from pcb.

Supporting the instructions is useful because this means that usermode
can implement green-threads completely in userspace without issuing
syscalls to change all of the machine context.

Support is implemented by saving the fs base and user gs base when
PCB_FULL_IRET flag is set. The flag is set on the context switch,
which potentially causes clobber of the bases due to activation of
another context, and when explicit modification of the user context by
a syscall or exception handler is performed. In particular, the patch
moves setting of the flag before syscalls change context.

The changes to doreti_exit and PUSH_FRAME to clear PCB_FULL_IRET on
entry from userspace can be considered a bug fixes on its own.

Reviewed by: jhb (previous version)
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D12023


# bd101a66 15-May-2017 Konstantin Belousov <kib@FreeBSD.org>

Ensure that resume path on amd64 only accesses page tables for normal
operation after processor is configured to allow all required
features.

In particular, NX must be enabled in EFER, otherwise load of page
table element with nx bit set causes reserved bit page fault. Since
malloc uses direct mapping for small allocations, in particular for
the suspension pcbs, and DMAP is nx after r316767, this commit tripped
fault on resume path.

Restore complete state of EFER while wakeup code is still executing
with custom page table, before calling resumectx, instead of trying to
guess which features might be needed before resumectx restored EFER on
its own.

Bisected and tested by: trasz
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# a546448b 09-May-2015 Konstantin Belousov <kib@FreeBSD.org>

Rewrite amd64 PCID implementation to follow an algorithm described in
the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency
Algorithms". The same algorithm is already utilized by the MIPS pmap
to handle ASIDs.

The PCID for the address space is now allocated per-cpu during context
switch to the thread using pmap, when no PCID on the cpu was ever
allocated, or the current PCID is invalidated. If the PCID is reused,
bit 63 of %cr3 can be set to avoid TLB flush.

Each cpu has PCID' algorithm generation count, which is saved in the
pmap pcpu block when pcpu PCID is allocated. On invalidation, the
pmap generation count is zeroed, which signals the context switch code
that already allocated PCID is no longer valid. The implication is
the TLB shootdown for the given cpu/address space, due to the
allocation of new PCID.

The pm_save mask is no longer has to be tracked, which (significantly)
reduces the targets of the TLB shootdown IPIs. Previously, pm_save
was reset only on pmap_invalidate_all(), which made it accumulate the
cpuids of all processors on which the thread was scheduled between
full TLB shootdowns.

Besides reducing the amount of TLB shootdowns and removing atomics to
update pm_saves in the context switch code, the algorithm is much
simpler than the maintanence of pm_save and selection of the right
address space in the shootdown IPI handler.

Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


# b1d735ba 06-Sep-2014 John Baldwin <jhb@FreeBSD.org>

Create a separate structure for per-CPU state saved across suspend and
resume that is a superset of a pcb. Move the FPU state out of the pcb and
into this new structure. As part of this, move the FPU resume code on
amd64 into a C function. This allows resumectx() to still operate only on
a pcb and more closely mirrors the i386 code.

Reviewed by: kib (earlier version)


# 1d22d877 04-Mar-2014 Jung-uk Kim <jkim@FreeBSD.org>

Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c.

Inspired by: jhb


# 603bc162 04-Mar-2014 Jung-uk Kim <jkim@FreeBSD.org>

Properly save and restore CR0.

MFC after: 3 days


# 05acaa9f 04-Mar-2014 Jung-uk Kim <jkim@FreeBSD.org>

Remove dead code since r230426, fix a comment, and tidy up.

Reported by: jhb
MFC after: 3 days


# 37eed841 30-Aug-2013 Konstantin Belousov <kib@FreeBSD.org>

Implement support for the process-context identifiers ('PCID') on
Intel CPUs. The feature tags TLB entries with the Id of the address
space and allows to avoid TLB invalidation on the context switch, it
is available only in the long mode. In the microbenchmarks, using the
PCID decreased latency of the context switches by ~30% on SandyBridge
class desktop CPUs, measured with the lat_ctx program from lmbench.

If available, use INVPCID instruction when a TLB entry in non-current
address space needs to be invalidated. The instruction is typically
available on the Haswell.

If needed, the use of PCID can be turned off with the
vm.pmap.pcid_enabled loader tunable set to 0. The state of the
feature is reported by the vm.pmap.pcid_enabled sysctl. The sysctl
vm.pmap.pcid_save_cnt reports the number of context switches which
avoided invalidating the TLB; compare with the total number of context
switches, available as sysctl vm.stats.sys.v_swtch.

Sponsored by: The FreeBSD Foundation
Reviewed by: alc
Tested by: pho, bf


# 333d0c60 14-Jul-2012 Konstantin Belousov <kib@FreeBSD.org>

Add support for the XSAVEOPT instruction use. Our XSAVE/XRSTOR usage
mostly meets the guidelines set by the Intel SDM:
1. We use XRSTOR and XSAVE from the same CPL using the same linear
address for the store area
2. Contrary to the recommendations, we cannot zero the FPU save area
for a new thread, since fork semantic requires the copy of the
previous state. This advice seemingly contradicts to the advice
from the item 6.
3. We do use XSAVEOPT in the context switch code only, and the area
for XSAVEOPT already always contains the data saved by XSAVE.
4. We do not modify the save area between XRSTOR, when the area is
loaded into FPU context, and XSAVE. We always spit the fpu context
into save area and start emulation when directly writing into FPU
context.
5. We do not use segmented addressing to access save area, or rather,
always address it using %ds basing.
6. XSAVEOPT can be only executed in the area which was previously
loaded with XRSTOR, since context switch code checks for FPU use by
outgoing thread before saving, and thread which stopped emulation
forcibly get context loaded with XRSTOR.
7. The PCB cannot be paged out while FPU emulation is turned off, since
stack of the executing thread is never swapped out.

The context switch code is patched to issue XSAVEOPT instead of XSAVE
if supported. This approach eliminates one conditional in the context
switch code, which would be needed otherwise.

For user-visible machine context to have proper data, fpugetregs()
checks for unsaved extension blocks and manually copies pristine FPU
state into them, according to the description provided by CPUID leaf
0xd.

MFC after: 1 month


# f18d5bf4 06-Jul-2012 Konstantin Belousov <kib@FreeBSD.org>

Use assembler mnemonic instead of manually assembling, contination for r238142.

Reviewed by: jhb
MFC after: 1 month


# 6ad79910 13-Jun-2012 Jung-uk Kim <jkim@FreeBSD.org>

- Remove unused code for CR3 and CR4.
- Fix few style(9) nits while I am here.


# acd7df97 13-Jun-2012 Jung-uk Kim <jkim@FreeBSD.org>

- Fix resumectx() prototypes to reflect reality.
- For i386, simply jump to resumectx() with PCB in %ecx.
- Fix a style(9) nit while I am here.


# fb864578 08-Jun-2012 Mitsuru IWASAKI <iwasaki@FreeBSD.org>

Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of
suspend/resume procedures are minimized among them.

common:
- Add global cpuset suspended_cpus to indicate APs are suspended/resumed.
- Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used).
- Add some variables in acpi_wakecode.S in order to minimize the difference
among amd64 and i386.
- Disable load_cr3() because now CR3 is restored in resumectx().

amd64:
- Add suspend/resume related members (such as MSR) in PCB.
- Modify savectx() for above new PCB members.
- Merge acpi_switch.S into cpu_switch.S as resumectx().

i386:
- Merge(and remove) suspendctx() into savectx() in order to match with
amd64 code.

Reviewed by: attilio@, acpi@


# 49a30208 27-Feb-2012 John Baldwin <jhb@FreeBSD.org>

Update incorrect comment.


# 8c6f8f3d 21-Jan-2012 Konstantin Belousov <kib@FreeBSD.org>

Add support for the extended FPU states on amd64, both for native
64bit and 32bit ABIs. As a side-effect, it enables AVX on capable
CPUs.

In particular:

- Query the CPU support for XSAVE, list of the supported extensions
and the required size of FPU save area. The hw.use_xsave tunable is
provided for disabling XSAVE, and hw.xsave_mask may be used to
select the enabled extensions.

- Remove the FPU save area from PCB and dynamically allocate the
(run-time sized) user save area on the top of the kernel stack,
right above the PCB. Reorganize the thread0 PCB initialization to
postpone it after BSP is queried for save area size.

- The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as
well. FPU state is only useful for suspend, where it is saved in
dynamically allocated suspfpusave area.

- Use XSAVE and XRSTOR to save/restore FPU state, if supported and
enabled.

- Define new mcontext_t flag _MC_HASFPXSTATE, indicating that
mcontext_t has a valid pointer to out-of-struct extended FPU
state. Signal handlers are supplied with stack-allocated fpu
state. The sigreturn(2) and setcontext(2) syscall honour the flag,
allowing the signal handlers to inspect and manipilate extended
state in the interrupted context.

- The getcontext(2) never returns extended state, since there is no
place in the fixed-sized mcontext_t to place variable-sized save
area. And, since mcontext_t is embedded into ucontext_t, makes it
impossible to fix in a reasonable way. Instead of extending
getcontext(2) syscall, provide a sysarch(2) facility to query
extended FPU state.

- Add ptrace(2) support for getting and setting extended state; while
there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries.

- Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to
consumers, making it opaque. Internally, struct fpu_kern_ctx now
contains a space for the extended state. Convert in-kernel consumers
of fpu_kern KPI both on i386 and amd64.

First version of the support for AVX was submitted by Tim Bird
<tim.bird am sony com> on behalf of Sony. This version was written
from scratch.

Tested by: pho (previous version), Yamagi Burmeister <lists yamagi org>
MFC after: 1 month


# 50e3cec3 22-Dec-2010 Jung-uk Kim <jkim@FreeBSD.org>

Increase size of pcb_flags to four bytes.

Requested by: bde, jhb


# e6c006d9 21-Dec-2010 Jung-uk Kim <jkim@FreeBSD.org>

Improve PCB flags handling and make it more robust. Add two new functions
for manipulating pcb_flags. These inline functions are very similar to
atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK
prefix for SMP. Add comments about the rationale[1]. Use these functions
wherever possible. Although there are some places where it is not strictly
necessary (e.g., a PCB is copied to create a new PCB), it is done across
the board for sake of consistency. Turn pcb_full_iret into a PCB flag as
it is safe now. Move rarely used fields before pcb_flags and reduce size
of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in
the neighborhood.

Reviewed by: kib
Submitted by: kib[1]
MFC after: 2 months


# cfe92f33 24-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

Change ambiguous (or invalid, depending on how strict you want to be :)
assembly instruction "movw %rcx,2(%rax)" to "movw %cx,2(%rax)", since
the intent was to move 16 bits of data, in this case.

Found by: clang
Reviewed by: kib


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 305c5c0a 30-Aug-2010 Jung-uk Kim <jkim@FreeBSD.org>

Save MSR_FSBASE, MSR_GSBASE and MSR_KGSBASE directly to PCB as we do not use
these values in the function.


# 3ab42a25 03-Aug-2010 Jung-uk Kim <jkim@FreeBSD.org>

savectx() has not been used for fork(2) for about 15 years. [1]
Do not clobber FPU thread's PCB as it is more harmful. When we resume CPU,
unconditionally reload FPU state.

Pointed out by: bde [1]


# a2d2c836 02-Aug-2010 Jung-uk Kim <jkim@FreeBSD.org>

- Merge savectx2() with savectx() and struct xpcb with struct pcb. [1]
savectx() is only used for panic dump (dumppcb) and kdb (stoppcbs). Thus,
saving additional information does not hurt and it may be even beneficial.
Unfortunately, struct pcb has grown larger to accommodate more data.
Move 512-byte long pcb_user_save to the end of struct pcb while I am here.
- savectx() now saves FPU state unconditionally and copy it to the PCB of
FPU thread if necessary. This gives panic dump and kdb a chance to take
a look at the current FPU state even if the FPU is "supposedly" not used.
- Resuming CPU now unconditionally reinitializes FPU. If the saved FPU
state was irrelevant, it could be in an unknown state.

Suggested by: bde [1]


# 9727ca6a 29-Jul-2010 Jung-uk Kim <jkim@FreeBSD.org>

Fix another fallout from r208833. savectx() is used to save CPU context
for crash dump (dumppcb) and kdb (stoppcbs). For both cases, there cannot
have a valid pointer in pcb_save. This should restore the previous
behaviour.


# 39381048 29-Jul-2010 Jung-uk Kim <jkim@FreeBSD.org>

Rename PCB_USER_FPU to PCB_USERFPU not to clash with a macro from fpu.h.


# 9bfb10b1 26-Jul-2010 Jung-uk Kim <jkim@FreeBSD.org>

Re-implement FPU suspend/resume for amd64. This removes superfluous uses
of critical_enter(9) and critical_exit(9) by fpugetregs() and fpusetregs().
Also, we do not touch PCB flags any more.

MFC after: 1 month


# aaa95ccb 12-Jul-2010 Konstantin Belousov <kib@FreeBSD.org>

When switching the thread from the processor, store %dr7 content
into the pcb before disabling watchpoints. Otherwise, when the
thread is restored on a processor, watchpoints are still disabled.

Submitted by: Tijl Coosemans <tijl coosemans org>
(I would be much happier if Tijl commited this himself)
MFC after: 1 week


# 77cb6e6f 07-Jul-2010 Alan Cox <alc@FreeBSD.org>

Correctly maintain the per-cpu field "curpmap" on amd64 just like we
do on i386. The consequences of not doing so on amd64 became apparent
with the introduction of the COUNT_IPIS and COUNT_XINVLTLB_HITS
options. Specifically, single-threaded applications were generating
unnecessary IPIs to shoot-down the TLB on other processors. However,
this is clearly nonsensical because a single-threaded application is
only running on the current processor. The reason that this happens
is that pmap_activate() is unable to properly update the old pmap's
field "pm_active" without the correct "curpmap". So, in effect, stale
bits in "pm_active" were leading pmap_protect(), pmap_remove(),
pmap_remove_pages(), etc. to flush the TLB contents on some arbitrary
processor that wasn't even running the same application.

Reviewed by: kib
MFC after: 3 weeks


# 6cf9a08d 05-Jun-2010 Konstantin Belousov <kib@FreeBSD.org>

Introduce the x86 kernel interfaces to allow kernel code to use
FPU/SSE hardware. Caller should provide a save area that is chained
into the stack of the areas; pcb save_area for usermode FPU state is
on top. The pcb now contains a pointer to the current FPU saved area,
used during FPUDNA handling and context switches. There is also a
facility to allow the kernel thread to use pcb save_area.

Change the dreaded warnings "npxdna in kernel mode!" into the panics
when FPU usage is not registered.

KPI discussed with: fabient
Tested by: pho, fabient
Hardware provided by: Sentex Communications
MFC after: 1 month


# a2622e5d 09-Jul-2009 Konstantin Belousov <kib@FreeBSD.org>

Restore the segment registers and segment base MSRs for amd64 syscall
return path only when neither thread was context switched while
executing syscall code nor syscall explicitely modified LDT or MSRs.

Save segment registers in trap handlers before interrupts are enabled,
to not allow context switches to happen before registers are saved.
Use separated byte in pcb for indication of fast/full return, since
pcb_flags are not synchronized with context switches.

The change puts back syscall microbenchmark numbers that were slowed
down after commit of the support for LDT on amd64.

Reviewed by: jeff
Tested (and tested, and tested ...) by: pho
Approved by: re (kensmith)


# 2c66ccca 01-Apr-2009 Konstantin Belousov <kib@FreeBSD.org>

Save and restore segment registers on amd64 when entering and leaving
the kernel on amd64. Fill and read segment registers for mcontext and
signals. Handle traps caused by restoration of the
invalidated selectors.

Implement user-mode creation and manipulation of the process-specific
LDT descriptors for amd64, see sysarch(2).

Implement support for TSS i/o port access permission bitmap for amd64.

Context-switch LDT and TSS. Do not save and restore segment registers on
the context switch, that is handled by kernel enter/leave trampolines
now. Remove segment restore code from the signal trampolines for
freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason.

Implement amd64-specific compat shims for sysarch.

Linuxolator (temporary ?) switched to use gsbase for thread_area pointer.

TODO:
Currently, gdb is not adapted to show segment registers from struct reg.
Also, no machine-depended ptrace command is added to set segment
registers for debugged process.

In collaboration with: pho
Discussed with: peter
Reviewed by: jhb
Linuxolator tested by: dchagin


# c66d2b38 16-Mar-2009 Jung-uk Kim <jkim@FreeBSD.org>

Initial suspend/resume support for amd64.

This code is heavily inspired by Takanori Watanabe's experimental SMP patch
for i386 and large portion was shamelessly cut and pasted from Peter Wemm's
AP boot code.


# e6493bbe 31-Jan-2009 David E. O'Brien <obrien@FreeBSD.org>

Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions
for moving between a segment register and a 32-bit memory location.

Looked at by: jhb


# 5c0c22e9 19-Jan-2009 Konstantin Belousov <kib@FreeBSD.org>

The context switch to the 32bit binary does not properly restore
the fsbase value. The switch loads the fs segment register, that
invalidates the value in fsbase msr, thus value in %r9 can not be
considered the current value for fsbase anymore.

Unconditionally reload fsbase when switching to 32bit binary.

PR: 130526
MFC after: 3 weeks


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 3bd5e467 08-Sep-2008 Konstantin Belousov <kib@FreeBSD.org>

The pcb_gs32p should be per-cpu, not per-thread pointer. This is
location in GDT where the segment descriptor from pcb_gs32sd is
copied, and the location is in GDT local to CPU.

Noted and reviewed by: peter
MFC after: 1 week


# f98c3ea7 02-Sep-2008 Konstantin Belousov <kib@FreeBSD.org>

- When executing FreeBSD/amd64 binaries from FreeBSD/i386 or Linux/i386
processes, clear PCB_32BIT and PCB_GS32BIT bits [1].

- Reread the fs and gs bases from the msr unconditionally, not believing
the values in pcb_fsbase and pcb_gsbase, since usermode may reload
segment registers, invalidating the cache. [2].

Both problems resulted in the wrong fs base, causing wrong tls pointer
be dereferenced in the usermode.

Reported and tested by: Vyacheslav Bocharov <adeepv at gmail com> [1]
Reported by: Bernd Walter <ticsoat cicely7 cicely de>,
Artem Belevich <fbsdlist at src cx>[2]
Reviewed by: peter
MFC after: 3 days


# 8f4a1f3a 30-Jul-2008 Konstantin Belousov <kib@FreeBSD.org>

Bring back the save/restore of the %ds, %es, %fs and %gs registers for
the 32bit images on amd64.

Change the semantic of the PCB_32BIT pcb flag to request the context
switch code to operate on the segment registers. Its previous meaning
of saving or restoring the %gs base offset is assigned to the new
PCB_GS32BIT flag.

FreeBSD 32bit image activator sets the PCB_32BIT flag, while Linux 32bit
emulation sets PCB_32BIT | PCB_GS32BIT.

Reviewed by: peter
MFC after: 2 weeks


# f001eabf 23-Mar-2008 Peter Wemm <peter@FreeBSD.org>

First pass at (possibly futile) microoptimizing of cpu_switch. Results
are mixed. Some pure context switch microbenchmarks show up to 29%
improvement. Pipe based context switch microbenchmarks show up to 7%
improvement. Real world tests are far less impressive as they are
dominated more by actual work than switch overheads, but depending on
the machine in question, workload, kernel options, phase of moon, etc, a
few percent gain might be seen.

Summary of changes:
- don't reload MSR_[FG]SBASE registers when context switching between
non-threaded userland apps. These typically cost 120 clock cycles each
on an AMD cpu (less on Barcelona/Phenom). Intel cores are probably no
faster on this.
- The above change only helps unthreaded userland apps that tend to use
the same value for gsbase. Threaded apps will get no benefit from this.
- reorder things like accessing the pcb to be in memory order, to give
prefetching a better chance of working. Operations are now in increasing
memory address order, rather than reverse or random.
- Push some lesser used code out of the main code paths. Hopefully
allowing better code density in cache lines. This is probably futile.
- (part 2 of previous item) Reorder code so that branches have a more
realistic static branch prediction hint. Both Intel and AMD cpus
default to predicting branches to lower memory addresses as being
taken, and to higher memory addresses as not being taken. This is
overridden by the limited dynamic branch prediction subsystem. A trip
through userland might overflow this.
- Futule attempt at spreading the use of the results of previous operations
in new operations. Hopefully this will allow the cpus to execute in
parallel better.
- stop wasting 16 bytes at the top of kernel stack, below the PCB.
- Never load the userland fs/gsbase registers for kthreads, but preserve
curpcb->pcb_[fg]sbase as caches for the cpu. (Thanks Jeff!)

Microbenchmarking this code seems to be really sensitive to things like
scheduling luck, timing, cache behavior, tlb behavior, kernel options,
other random code changes, etc.

While it doesn't help heavy userland workloads much, it does help high
context switch loads a little, and should help those that involve
switching via kthreads a bit more.

A special thanks to Kris for the testing and reality checks, and Jeff for
tormenting me into doing this. :)

This is still work-in-progress.


# ea497502 21-Aug-2007 Joseph Koshy <jkoshy@FreeBSD.org>

Assign sizes to assembly language support functions.

Approved by: re (kensmith)


# 40380a6a 17-Jul-2007 Jeff Roberson <jeff@FreeBSD.org>

- Optimize the amd64 cpu_switch() TD_LOCK blocking and releasing to
require fewer blocking loops.
- Don't use atomic ops with 4BSD or on UP.
- Only use the blocking loop if ULE is compiled in.
- Use the correct memory barrier.

Discussed with: attilio, jhb, ssouhlal
Tested by: current@
Approved by: re


# 42ce445f 06-Jun-2007 David Xu <davidxu@FreeBSD.org>

Backout experimental adaptive-spin umtx code.


# 5d68dad3 04-Jun-2007 Jeff Roberson <jeff@FreeBSD.org>

- Add a new argument to cpu_switch. This is a pointer to a mutex that
oldthread should point at before we return.
- When cpu_switch() is called the td_lock pointer in the old thread may
point at the blocked lock. This prevents other processors from
switching into this thread while we're still switching out. Wait
until we're done deactivating the vmspace before we release the
thread by assigning to td_lock.
- Before we can activate the new vmspace we must make sure that the new
thread is not assigned to the blocked lock. It may be in the process
of switching out on another cpu. Spin until the new thread is
available.


# 9c5b213e 29-Mar-2007 Jung-uk Kim <jkim@FreeBSD.org>

MFP4: Linux set_thread_area syscall (aka TLS) support for amd64.

Initial version was submitted by Divacky Roman and mostly rewritten by me.

Tested by: emulation


# 4e32b7b3 19-Dec-2006 David Xu <davidxu@FreeBSD.org>

Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
kern.threads.umtx_dflt_spins
if the sysctl value is non-zero, a zero umutex.m_spincount will
cause the sysctl value to be used a spin cycle count.
kern.threads.umtx_max_spins
the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130


# 9313eb55 17-Oct-2005 David Xu <davidxu@FreeBSD.org>

Micro optimization for context switch. Eliminate code for saving gs.base
and fs.base. We always update pcb.pcb_gsbase and pcb.pcb_fsbase
when user wants to set them, in context switch routine, we only need to
write them into registers, we never have to read them out from registers
when thread is switched away. Since rdmsr is a serialization instruction,
micro benchmark shows it is worthy to do.

Reviewed by: peter, jhb


# 1acc225f 27-Sep-2005 Peter Wemm <peter@FreeBSD.org>

Kill pcb_rflags. It served no purpose.

Reported by: bde


# 98df9e00 27-Sep-2005 Peter Wemm <peter@FreeBSD.org>

Fix a minor nit that has been bugging me for a while. Fix the obvious
cases of using a 64 bit operation to zero a register. 32 bit opcodes are
smaller and supposedly faster, and clear the upper 32 bits for free.


# 2c87e001 16-Aug-2004 Peter Wemm <peter@FreeBSD.org>

Sync with i386 - s/cpu_swtch/cpu_switch/


# df4fd277 16-May-2004 Peter Wemm <peter@FreeBSD.org>

Checkpoint some of what I was starting to tinker with for having some
different context support for 32 vs 64 bit processes. This simply omits
the save/restore of the segment selector registers for non 32 bit
processes. This avoids the rdmsr/rwmsr juggling when restoring %gs
clobbers the kernel msr that holds the gsbase.

However, I suspect it might be better to conditionally do this at
user<->kernel transition where we wouldn't need to do the juggling in the
first place. Or have per-thread extended context save/restore hooks.


# 12c1418c 16-May-2004 Peter Wemm <peter@FreeBSD.org>

Kill the LAZYPMAP ifdefs. While they worked, they didn't do anything
to help the AMD cpus (which have a hardware tlb flush filter). I held
off to see what the 64 bit Intel cpus did, but it doesn't seem to help
much there either. Oh well, store it in the Attic.


# 9a80fddc 05-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999 and email from Peter Wemm.

Approved by: core, peter


# 51621230 06-Feb-2004 Peter Wemm <peter@FreeBSD.org>

Remove the badsw* INVARIANTS checks. The events that this attempts
to catch are already nicely caught by trapping the null pointer derefs.
Remove no-longer-used noswitch/nothrow strings. They were referenced
by the stub cpu_switch() etc functions before they were implemented.
Try something a little different for the lock prefixes.

Prompted by: bde (the first two items anyway)


# db527225 28-Jan-2004 Peter Wemm <peter@FreeBSD.org>

Take another shot at the invariants calls to __panic. They hadn't been
updated for the regparm ABI on amd64.
Context switch debug regs.
Update for fpu simplification
Don't needlessly reload %cr3, in case the cpu has the tlb flush filter
turned off. Re-add LAZY_SWITCH stubs.


# ccaa41bc 22-Jan-2004 Peter Wemm <peter@FreeBSD.org>

Unbreak amd64: Rename calls from panic to __panic


# 0d2a2989 17-Nov-2003 Peter Wemm <peter@FreeBSD.org>

Initial landing of SMP support for FreeBSD/amd64.

- This is heavily derived from John Baldwin's apic/pci cleanup on i386.
- I have completely rewritten or drastically cleaned up some other parts.
(in particular, bootstrap)
- This is still a WIP. It seems that there are some highly bogus bioses
on nVidia nForce3-150 boards. I can't stress how broken these boards
are. I have a workaround in mind, but right now the Asus SK8N is broken.
The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed.
- Most of my testing has been with SCHED_ULE. SCHED_4BSD works.
- the apic and acpi components are 'standard'.
- If you have an nVidia nForce3-150 board, you are stuck with 'device
atpic' in addition, because they somehow managed to forget to connect the
8254 timer to the apic, even though its in the same silicon! ARGH!
This directly violates the ACPI spec.


# fcfe57d6 07-Nov-2003 Peter Wemm <peter@FreeBSD.org>

Update the graffiti.


# bf2f09ee 07-Nov-2003 Peter Wemm <peter@FreeBSD.org>

The great s/npx/fpu/gi


# c0a54ff6 14-May-2003 Peter Wemm <peter@FreeBSD.org>

Collect the nastiness for preserving the kernel MSR_GSBASE around the
load_gs() calls into a single place that is less likely to go wrong.

Eliminate the per-process context switching of MSR_GSBASE, because it
should be constant for a single cpu. Instead, save/restore it during
the loading of the new %gs selector for the new process.

Approved by: re (amd64/* blanket)


# d85631c4 13-May-2003 Peter Wemm <peter@FreeBSD.org>

Add BASIC i386 binary support for the amd64 kernel. This is largely
stolen from the ia64/ia32 code (indeed there was a repocopy), but I've
redone the MD parts and added and fixed a few essential syscalls. It
is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic)
and p4. The ia64 code has not implemented signal delivery, so I had
to do that.

Before you say it, yes, this does need to go in a common place. But
we're in a freeze at the moment and I didn't want to risk breaking ia64.
I will sort this out after the freeze so that the common code is in a
common place.

On the AMD64 side, this required adding segment selector context switch
support and some other support infrastructure. The %fs/%gs etc code
is hairy because loading %gs will clobber the kernel's current MSR_GSBASE
setting. The segment selectors are not used by the kernel, so they're only
changed at context switch time or when changing modes. This still needs
to be optimized.

Approved by: re (amd64/* blanket)


# bf1e8974 11-May-2003 Peter Wemm <peter@FreeBSD.org>

Give a %fs and %gs to userland. Use swapgs to obtain the kernel %GS.base
value on entry and exit. This isn't as easy as it sounds because when
we recursively trap or interrupt, we have to avoid duplicating the
swapgs instruction or we end up back with the userland %gs. I implemented
this by testing TF_CS to see if we're coming from supervisor mode
already, and check for returning to supervisor. To avoid a race with
interrupts in the brief period after beginning executing the handler and
before the swapgs, convert all trap gates to interrupt gates, and reenable
interrupts immediately after the swapgs. I am not happy with this.
There are other possible ways to do this that should be investigated.
(eg: storing the GS.base MSR value in the trapframe)

Add some sysarch functions to let the userland code get to this.

Approved by: re (blanket amd64/*)


# afa88623 30-Apr-2003 Peter Wemm <peter@FreeBSD.org>

Commit MD parts of a loosely functional AMD64 port. This is based on
a heavily stripped down FreeBSD/i386 (brutally stripped down actually) to
attempt to get a stable base to start from. There is a lot missing still.
Worth noting:
- The kernel runs at 1GB in order to cheat with the pmap code. pmap uses
a variation of the PAE code in order to avoid having to worry about 4
levels of page tables yet.
- It boots in 64 bit "long mode" with a tiny trampoline embedded in the
i386 loader. This simplifies locore.s greatly.
- There are still quite a few fragments of i386-specific code that have
not been translated yet, and some that I cheated and wrote dumb C
versions of (bcopy etc).
- It has both int 0x80 for syscalls (but using registers for argument
passing, as is native on the amd64 ABI), and the 'syscall' instruction
for syscalls. int 0x80 preserves all registers, 'syscall' does not.
- I have tried to minimize looking at the NetBSD code, except in a couple
of places (eg: to find which register they use to replace the trashed
%rcx register in the syscall instruction). As a result, there is not a
lot of similarity. I did look at NetBSD a few times while debugging to
get some ideas about what I might have done wrong in my first attempt.


# c81e825f 05-Apr-2003 Peter Wemm <peter@FreeBSD.org>

Unbreak the !LAZY_SWITCH case. I #ifdef'ed too much when I added
the ifdefs prior to commit and killed the same-address-space test.

Submitted by: bde


# cc66ebe2 02-Apr-2003 Peter Wemm <peter@FreeBSD.org>

Commit a partial lazy thread switch mechanism for i386. it isn't as lazy
as it could be and can do with some more cleanup. Currently its under
options LAZY_SWITCH. What this does is avoid %cr3 reloads for short
context switches that do not involve another user process. ie: we can
take an interrupt, switch to a kthread and return to the user without
explicitly flushing the tlb. However, this isn't as exciting as it could
be, the interrupt overhead is still high and too much blocks on Giant
still. There are some debug sysctls, for stats and for an on/off switch.

The main problem with doing this has been "what if the process that you're
running on exits while we're borrowing its address space?" - in this case
we use an IPI to give it a kick when we're about to reclaim the pmap.

Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a
few more things and get some more feedback before turning it on by default.

This is NOT a replacement for Bosko's lazy interrupt stuff. This was more
meant for the kthread case, while his was for interrupts. Mine helps a
little for interrupts, but his helps a lot more.

The stats are enabled with options SWTCH_OPTIM_STATS - this has been a
pseudo-option for years, I just added a bunch of stuff to it.

One non-trivial change was to select a new thread before calling
cpu_switch() in the first place. This allows us to catch the silly
case of doing a cpu_switch() to the current process. This happens
uncomfortably often. This simplifies a bit of the asm code in cpu_switch
(no longer have to call choosethread() in the middle). This has been
implemented on i386 and (thanks to jake) sparc64. The others will come
soon. This is actually seperate to the lazy switch stuff.

Glanced at by: jake, jhb


# 2fbe601a 22-Jan-2003 Peter Wemm <peter@FreeBSD.org>

Now that TPR isn't bogusly raised at boot, there is no need to clear
it at context switch.


# e344afe7 20-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Move SWTCH_OPTIM_STATS related code out of cpufunc.h. (This sort of stat
gathering is not an x86 cpu feature)


# 33d7ad1a 12-Jul-2002 John Baldwin <jhb@FreeBSD.org>

Set the thread state of the newly chosen to run thread to TDS_RUNNING in
choosethread() in MI C code instead of doing it in in assembly in all the
various cpu_switch() functions. This fixes problems on ia64 and sparc64.

Reviewed by: julian, peter, benno
Tested on: i386, alpha, sparc64


# e602ba25 29-Jun-2002 Julian Elischer <julian@FreeBSD.org>

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


# d74ac681 26-Mar-2002 Matthew Dillon <dillon@FreeBSD.org>

Compromise for critical*()/cpu_critical*() recommit. Cleanup the interrupt
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).

Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.

This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.

Reviewed by: core
Approved by: core


# 181df8c9 26-Feb-2002 Matthew Dillon <dillon@FreeBSD.org>

revert last commit temporarily due to whining on the lists.


# f96ad4c2 26-Feb-2002 Matthew Dillon <dillon@FreeBSD.org>

STAGE-1 of 3 commit - allow (but do not require) interrupts to remain
enabled in critical sections and streamline critical_enter() and
critical_exit().

This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes. Architectures that do not wish to do
this are not effected by this change.

This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.

This commit is just the first stage. Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.

The following changes have been made:

* critical_enter() and critical_exit() for I386 now simply increment
and decrement curthread->td_critnest. They no longer disable
hard interrupts. When critical_exit() decrements the counter to
0 it effectively calls a routine to deal with whatever interrupts
were deferred during the time the code was operating in a critical
section.

Other architectures are unaffected.

* fork_exit() has been conditionalized to remove MD assumptions for
the new code. Old code will still use the old MD assumptions
in regards to hard interrupt disablement. In STAGE-2 this will
be turned into a subroutine call into MD code rather then hardcoded
in MI code.

The new code places the burden of entering the critical section
in the trampoline code where it belongs.

* I386: interrupts are now enabled while we are in a critical section.
The interrupt vector code has been adjusted to deal with the fact.
If it detects that we are in a critical section it currently defers
the interrupt by adding the appropriate bit to an interrupt mask.

* In order to accomplish the deferral, icu_lock is required. This
is i386-specific. Thus icu_lock can only be obtained by mainline
i386 code while interrupts are hard disabled. This change has been
made.

* Because interrupts may or may not be hard disabled during a
context switch, cpu_switch() can no longer simply assume that
PSL_I will be in a consistent state. Therefore, it now saves and
restores eflags.

* FAST INTERRUPT PROVISION. Fast interrupts are currently deferred.
The intention is to eventually allow them to operate either while
we are in a critical section or, if we are able to restrict the
use of sched_lock, while we are not holding the sched_lock.

* ICU and APIC vector assembly for I386 cleaned up. The ICU code
has been cleaned up to match the APIC code in regards to format
and macro availability. Additionally, the code has been adjusted
to deal with deferred interrupts.

* Deferred interrupts use a per-cpu boolean int_pending, and
masks ipending, spending, and fpending. Being per-cpu variables
it is not currently necessary to lock; bus cycles modifying them.

Note that the same mechanism will enable preemption to be
incorporated as a true software interrupt without having to
further hack up the critical nesting code.

* Note: the old critical_enter() code in kern/kern_switch.c is
currently #ifdef to be compatible with both the old and new
methodology. In STAGE-2 it will be moved entirely to MD code.

Performance issues:

One of the purposes of this commit is to enhance critical section
performance, specifically to greatly reduce bus overhead to allow
the critical section code to be used to protect per-cpu caches.
These caches, such as Jeff's slab allocator work, can potentially
operate very quickly making the effective savings of the new
critical section code's performance very significant.

The second purpose of this commit is to allow architectures to
enable certain interrupts while in a critical section. Specifically,
the intention is to eventually allow certain FAST interrupts to
operate rather then defer.

The third purpose of this commit is to begin to clean up the
critical_enter()/critical_exit()/cpu_critical_enter()/
cpu_critical_exit() API which currently has serious cross pollution
in MI code (in fork_exit() and ast() for example).

The fourth purpose of this commit is to provide a framework that
allows kernel-preempting software interrupts to be implemented
cleanly. This is currently used for two forward interrupts in I386.
Other architectures will have the choice of using this infrastructure
or building the functionality directly into critical_enter()/
critical_exit().

Finally, this commit is designed to greatly improve the flexibility
of various architectures to manage critical section handling,
software interrupts, preemption, and other highly integrated
architecture-specific details.


# 620080d0 07-Feb-2002 Peter Wemm <peter@FreeBSD.org>

Attempt to patch up some style bugs introduced in the previous commit


# 079b7bad 07-Feb-2002 Julian Elischer <julian@FreeBSD.org>

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


# e744f309 17-Jan-2002 Bruce Evans <bde@FreeBSD.org>

Changed the type of pcb_flags from u_char to u_int and adjusted things.
This removes the only atomic operation on a char type in the entire
kernel.


# 0bbc8826 11-Dec-2001 John Baldwin <jhb@FreeBSD.org>

Overhaul the per-CPU support a bit:

- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.

Tested on: alpha, i386
Reviewed by: peter, jake


# d01404c8 29-Oct-2001 John Baldwin <jhb@FreeBSD.org>

Fix a typo in comment and #ifdef fixes: GRAP_PRIO -> GRAB_PRIO so that
x86 SMP kernels actually boot again to single user mode.

Pointy hat to: jhb
Noticed by: jlemon


# 9869fa1d 28-Oct-2001 John Baldwin <jhb@FreeBSD.org>

- More whitespace and comment cleanups.
- Remove unused sw1a label. A breakpoint can be set in choosethread() for
the same effect.

Reviewed by: bde
Submitted by: bde (partly)


# e0e30307 25-Oct-2001 John Baldwin <jhb@FreeBSD.org>

Currently no code does a CROSSJUMP() to sw1a, so we don't need a
CROSSJUMPTARGET() for it.

Submitted by: bde


# 02c41f11 25-Oct-2001 John Baldwin <jhb@FreeBSD.org>

Use %ecx instead of %ebx for the scratch register while updating %dr7 since
%ecx isn't a call safe register and thus we don't have to save and restore
it.

Submitted by: bde


# 7df8a724 25-Oct-2001 John Baldwin <jhb@FreeBSD.org>

- Fix typo in comment from previous revision.
- Fix a bug in the LDT changes where the wrong argument was passed to
set_user_ldt() from cpu_switch(). The bug was passing a pointer to the
ldt, but set_user_ldt() takes a pointer to the process' mdproc structure.

Submitted by: bde


# 163fd6fb 25-Oct-2001 John Baldwin <jhb@FreeBSD.org>

Whitespace, comment, and string fixes.

Submitted by: bde (mostly)


# 24db0459 24-Oct-2001 John Baldwin <jhb@FreeBSD.org>

Split the per-process Local Descriptor Table out of the PCB and into
struct mdproc.

Submitted by: Andrew R. Reiter <arr@watson.org>
Silence on: -current


# 2f01a0c0 18-Sep-2001 Peter Wemm <peter@FreeBSD.org>

Fix a mistake I made with the pcb movement relative to the stack in the
KSE patch. We need to leave the 16 bytes here for enabling the trapframe
to be converted to a vm86trapframe if we're switching *to* a vm86 context.


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 3ad234d4 18-Jul-2001 Brian S. Dean <bsd@FreeBSD.org>

swtch.s: During context save, use the correct bit mask for clearing
the non-reserved bits of dr7.

During context restore, load dr7 in such a way as to not
disturb reserved bits.

machdep.c: Don't explicitly disallow the setting of the reserved bits
in dr7 since we now keep from setting them when we load dr7
from the PCB.

This allows one to write back the dr7 value obtained from
the system without triggering an EINVAL (one of the
reserved bits always seems to be set after taking a trace
trap).

MFC after: 7 days


# c2b095ab 20-May-2001 Bruce Evans <bde@FreeBSD.org>

Use a critical region to protect saving of the npx state in savectx().
Not doing this was fairly harmless because savectx() is only called
for panic dumps and the bug could at worse reset the state.

savectx() is still missing saving of (volatile) debug registers, and
still isn't called for core dumps.


# 8bd57f8f 15-May-2001 John Baldwin <jhb@FreeBSD.org>

Remove unneeded includes of sys/ipl.h and machine/ipl.h.


# 02318dac 24-Feb-2001 Jake Burkholder <jake@FreeBSD.org>

Remove the leading underscore from all symbols defined in x86 asm
and used in C or vice versa. The elf compiler uses the same names
for both. Remove asnames.h with great prejudice; it has served its
purpose.

Note that this does not affect the ability to generate an aout kernel
due to gcc's -mno-underscores option.

moral support from: peter, jhb


# f1532aad 22-Feb-2001 Peter Wemm <peter@FreeBSD.org>

Activate USER_LDT by default. The new thread libraries are going to
depend on this. The linux ABI emulator tries to use it for some linux
binaries too. VM86 had a bigger cost than this and it was made default
a while ago.

Reviewed by: jhb, imp


# 5813dc03 19-Feb-2001 John Baldwin <jhb@FreeBSD.org>

- Don't call clear_resched() in userret(), instead, clear the resched flag
in mi_switch() just before calling cpu_switch() so that the first switch
after a resched request will satisfy the request.
- While I'm at it, move a few things into mi_switch() and out of
cpu_switch(), specifically set the p_oncpu and p_lastcpu members of
proc in mi_switch(), and handle the sched_lock state change across a
context switch in mi_switch().
- Since cpu_switch() no longer handles the sched_lock state change, we
have to setup an initial state for sched_lock in fork_exit() before we
release it.


# d5a08a60 11-Feb-2001 Jake Burkholder <jake@FreeBSD.org>

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


# d888fc4e 11-Feb-2001 Mark Murray <markm@FreeBSD.org>

RIP <machine/lock.h>.

Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by: jhb


# 142ba5f3 09-Feb-2001 John Baldwin <jhb@FreeBSD.org>

- Make astpending and need_resched process attributes rather than CPU
attributes. This is needed for AST's to be properly posted in a preemptive
kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and
PS_NEEDRESCHED. They are still accesssed by their old macros:
aston(), astoff(), etc. For completeness, an astpending() macro has been
added to check for a pending AST, and clear_resched() has been added to
clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
other architectures.


# 7dd2de5b 19-Jan-2001 Jake Burkholder <jake@FreeBSD.org>

Rename the ASSYM MTX_RECURSE to MTX_RECURSECNT in order to not conflict
with the flag of the same name.


# 558226ea 19-Jan-2001 Peter Wemm <peter@FreeBSD.org>

Use #ifdef DEV_NPX from opt_npx.h instead of #if NNPX > 0 from npx.h


# 41ed17bf 06-Jan-2001 Jake Burkholder <jake@FreeBSD.org>

Use %fs to access per-cpu variables in uni-processor kernels the same
as multi-processor kernels. The old way made it difficult for kernel
modules to be portable between uni-processor and multi-processor
kernels. It is no longer necessary to jump through hoops.

- always load %fs with the private segment on entry to the kernel
- change the type of the self referntial pointer from struct privatespace
to struct globaldata
- make the globaldata symbol have value 0 in all cases, so the symbols
in globals.s are always offsets, not aliases for fields in globaldata
- define the globaldata space used for uniprocessor kernels in C, rather
than assembler
- change the assmebly language accessors to use %fs, add a macro
PCPU_ADDR(member, reg), which loads the register reg with the address
of the per-cpu variable member


# 7d8e3aa0 13-Dec-2000 Jake Burkholder <jake@FreeBSD.org>

Use _lapic+offset to access the local apic from assembly language
files, rather than the symbols in globals.s. The offsets are
generated by genassym.


# 6d43764a 13-Dec-2000 Jake Burkholder <jake@FreeBSD.org>

Introduce a new potientially cleaner interface for accessing per-cpu
variables from i386 assembly language. The syntax is PCPU(member)
where member is the capitalized name of the per-cpu variable, without
the gd_ prefix. Example: movl %eax,PCPU(CURPROC). The capitalization
is due to using the offsets generated by genassym rather than the symbols
provided by linking with globals.o. asmacros.h is the wrong place for
this but it seemed as good a place as any for now. The old implementation
in asnames.h has not been removed because it is still used to de-mangle
the symbols used by the C variables for the UP case.


# 1b00f920 08-Dec-2000 Jake Burkholder <jake@FreeBSD.org>

Revert the previous change I made to cpu_switch. It doesn't help as
much as I thought it would and according to bde was a pessimization.


# 1306962a 02-Dec-2000 Jake Burkholder <jake@FreeBSD.org>

Change cpu_switch to explicitly popl the callers program counter and
pushl that of the new process, rather than doing a movl (%esp) and
assuming that the stack has been setup right. This make the initial
stack setup slightly more sane, and will make it easier to stick
an interrupted process onto the run queue without its knowing.


# 835a748f 17-Nov-2000 John Baldwin <jhb@FreeBSD.org>

- Change extra sanity checks in cpu_switch() to be conditional on INVARIANTS
instead of DIAGNOSTIC.
- Remove the p_wchan check as it no longer applies since a process may be
switched out during CURSIG() within msleep() or mawait().
- Remove an extra sanity check only needed during the early SMPng work.


# ac5f943c 13-Oct-2000 Peter Wemm <peter@FreeBSD.org>

savectx() is now used exclusively by the crash dump system. Move the
i386 specific gunk (copy %cr3 to the pcb) from the MI dumpsys() to the
MD savectx().


# e4a85a9b 08-Oct-2000 Bruce Evans <bde@FreeBSD.org>

Unremoved used include of <machine/ipl.h>. Removing it in rev.1.95
significantly pessimized syscalls by arranging to do null rescheduling
on return from every syscall. (AST_RESCHED was not defined, and the
mask ~AST_RESCHED gets replaced by the useless mask ~0. This bug has
been fixed before, in rev.1.92.)


# 6c567274 05-Oct-2000 John Baldwin <jhb@FreeBSD.org>

- Change fast interrupts on x86 to push a full interrupt frame and to
return through doreti to handle ast's. This is necessary for the
clock interrupts to work properly.
- Change the clock interrupts on the x86 to be fast instead of threaded.
This is needed because both hardclock() and statclock() need to run in
the context of the current process, not in a separate thread context.
- Kill the prevproc hack as it is no longer needed.
- We really need Giant when we call psignal(), but we don't want to block
during the clock interrupt. Instead, use two p_flag's in the proc struct
to mark the current process as having a pending SIGVTALRM or a SIGPROF
and let them be delivered during ast() when hardclock() has finished
running.
- Remove CLKF_BASEPRI, which was #ifdef'd out on the x86 anyways. It was
broken on the x86 if it was turned on since cpl is gone. It's only use
was to bogusly run softclock() directly during hardclock() rather than
scheduling an SWI.
- Remove the COM_LOCK simplelock and replace it with a clock_lock spin
mutex. Since the spin mutex already handles disabling/restoring
interrupts appropriately, this also lets us axe all the *_intr() fu.
- Back out the hacks in the APIC_IO x86 cpu_initclocks() code to use
temporary fast interrupts for the APIC trial.
- Add two new process flags P_ALRMPEND and P_PROFPEND to mark the pending
signals in hardclock() that are to be delivered in ast().

Submitted by: jakeb (making statclock safe in a fast interrupt)
Submitted by: cp (concept of delaying signals until ast())


# 1931cf94 05-Oct-2000 John Baldwin <jhb@FreeBSD.org>

- Heavyweight interrupt threads on the alpha for device I/O interrupts.
- Make softinterrupts (SWI's) almost completely MI, and divorce them
completely from the x86 hardware interrupt code.
- The ihandlers array is now gone. Instead, there is a MI shandlers array
that just contains SWI handlers.
- Most of the former machine/ipl.h files have moved to a new sys/ipl.h.
- Stub out all the spl*() functions on all architectures.

Submitted by: dfr


# 72b535ef 21-Sep-2000 Mike Smith <msmith@FreeBSD.org>

Implement halt-on-idle in the !SMP case, which should significantly
reduce power consumption on most systems.


# 0384fff8 06-Sep-2000 Jason Evans <jasone@FreeBSD.org>

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


# 93367624 10-May-2000 Peter Wemm <peter@FreeBSD.org>

Move <machine/ipl.h> outside #ifdef SMP because it supplies AST_RESCHED.
Without this, it shows up as an undefined symbol in /kernel. (!)
(This looks very freaky when doing a nm /kernel!)


# bd5caafc 28-Mar-2000 Matthew Dillon <dillon@FreeBSD.org>

The SMP cleanup commit broke need_resched, this fixes that and also
removed unncessary MPLOCKED and 'lock' prefixes from the interrupt
nesting level, since (A) the MP lock is held at the time, and (B) since
the neting level is restored prior to return any interrupted code
will see a consistent value.


# 36e9f877 28-Mar-2000 Matthew Dillon <dillon@FreeBSD.org>

Commit major SMP cleanups and move the BGL (big giant lock) in the
syscall path inward. A system call may select whether it needs the MP
lock or not (the default being that it does need it).

A great deal of conditional SMP code for various deadended experiments
has been removed. 'cil' and 'cml' have been removed entirely, and the
locking around the cpl has been removed. The conditional
separately-locked fast-interrupt code has been removed, meaning that
interrupts must hold the CPL now (but they pretty much had to anyway).
Another reason for doing this is that the original separate-lock for
interrupts just doesn't apply to the interrupt thread mechanism being
contemplated.

Modifications to the cpl may now ONLY occur while holding the MP
lock. For example, if an otherwise MP safe syscall needs to mess with
the cpl, it must hold the MP lock for the duration and must (as usual)
save/restore the cpl in a nested fashion.

This is precursor work for the real meat coming later: avoiding having
to hold the MP lock for common syscalls and I/O's and interrupt threads.
It is expected that the spl mechanisms and new interrupt threading
mechanisms will be able to run in tandem, allowing a slow piecemeal
transition to occur.

This patch should result in a moderate performance improvement due to
the considerable amount of code that has been removed from the critical
path, especially the simplification of the spl*() calls. The real
performance gains will come later.

Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch


# f8515dd8 02-Jan-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Move the "sti" instruction to right before the "hlt" to close a tiny
race condition.

Obtained from: bde and/or obrien


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# 28f31ccf 18-Aug-1999 Peter Wemm <peter@FreeBSD.org>

Use the MI process selection. We use a quick routine to decide whether
to get the mplock and enter the kernel to run a process in the SMP case.


# eec2e836 10-Jul-1999 Bruce Evans <bde@FreeBSD.org>

Go back to the old (icu.s rev.1.7 1993) way of keeping the AST-pending
bit separate from ipending, since this is simpler and/or necessary for
SMP and may even be better for UP.

Reviewed by: alc, luoqi, tegge


# ab001a72 08-Jul-1999 Jonathan Lemon <jlemon@FreeBSD.org>

Implement support for hardware debug registers on the i386.

Submitted by: Brian Dean <brdean@unx.sas.com>


# 789fb7cc 03-Jul-1999 Alan Cox <alc@FreeBSD.org>

An SMP-specific change: Add the lock prefix to RMW operations
on ipending.


# eb9d435a 01-Jun-1999 Jonathan Lemon <jlemon@FreeBSD.org>

Unifdef VM86.

Reviewed by: silence on on -current


# 0f0fe5a4 12-May-1999 Luoqi Chen <luoqi@FreeBSD.org>

Unbreak VESA on SMP.


# ea2b3e3d 06-May-1999 Bruce Evans <bde@FreeBSD.org>

Fixed profiling of elf kernels. Made high resolution profiling compile
for elf kernels (it is broken for all kernels due to lack of egcs support).

Renaming of many assembler labels is avoided by declaring by declaring
the labels that need to be visible to gprof as having type "function"
and depending on the elf version of gprof being zealous about discarding
the others. A few type declarations are still missing, mainly for SMP.

PR: 9413
Submitted by: Assar Westerlund <assar@sics.se> (initial parts)


# 5206bca1 27-Apr-1999 Luoqi Chen <luoqi@FreeBSD.org>

Enable vmspace sharing on SMP. Major changes are,
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
is added to point to per-cpu pages, per-cpu global variables are now
accessed through this new selector (%fs). The selectors in gdt table are
rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.

Reviewed by: Alan Cox <alc@cs.rice.edu>
John Dyson <dyson@iquest.net>
Julian Elischer <julian@whistel.com>
Bruce Evans <bde@zeta.org.au>
David Greenman <dg@root.com>


# 087e80a9 02-Apr-1999 Alan Cox <alc@FreeBSD.org>

Put in place the infrastructure for improved UP and SMP TLB management.

In particular, replace the unused field pmap::pm_flag by pmap::pm_active,
which is a bit mask representing which processors have the pmap activated.
(Thus, it is a simple Boolean on UPs.)

Also, eliminate an unnecessary memory reference from cpu_switch()
in swtch.s.

Assisted by: John S. Dyson <dyson@iquest.net>
Tested by: Luoqi Chen <luoqi@watermarkgroup.com>,
Poul-Henning Kamp <phk@critter.freebsd.dk>


# 2618393e 20-Mar-1999 Alan Cox <alc@FreeBSD.org>

Eliminate a pointless TLB flush from the SMP idle loop.

Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>
Reviewed by: "John S. Dyson" <toor@dyson.iquest.net>


# 6d2b6a08 17-Mar-1999 Jonathan Lemon <jlemon@FreeBSD.org>

Change btrl/btsl to cmpl/movl, since each cpu now has their own copy
of private_tss, and there's no need to use a bit array. Also fixes
the problem of using `je' after btrl, since cmpl sets ZF.

Noticed by: Luoqi, on -current


# aa839b4b 28-Jul-1998 Bruce Evans <bde@FreeBSD.org>

Micro-optimized and cleaned up the clearing of switchtime in idle().

Cleaned up the conditionals in the disgusting SMP ifdef in idle().


# e796e00d 28-May-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Some cleanups related to timecounters and weird ifdefs in <sys/time.h>.

Clean up (or if antipodic: down) some of the msgbuf stuff.

Use an inline function rather than a macro for timecounter delta.

Maintain process "on-cpu" time as 64 bits of microseconds to avoid
needless second rollover overhead.

Avoid calling microuptime the second time in mi_switch() if we do
not pass through _idle in cpu_switch()

This should reduce our context-switch overhead a bit, in particular
on pre-P5 and SMP systems.

WARNING: Programs which muck about with struct proc in userland
will have to be fixed.

Reviewed, but found imperfect by: bde


# daa2c78f 19-May-1998 Peter Dufault <dufault@FreeBSD.org>

Remove option for SCHED_FIFO. With this optional, SCHED_FIFO
is the same as RTPRIO_IDLE when it falls through to the default.


# cfa5644b 12-May-1998 John Dyson <dyson@FreeBSD.org>

Some temporary fixes to SMP to make it more scheduling and signal friendly.
This is a result of discussions on the mailing lists. Kudos to those who
have found the issue and created work-arounds. I have chosen Tor's fix
for now, before we can all work the issue more completely.
Submitted by: Tor Egge


# 74164362 06-Apr-1998 Peter Wemm <peter@FreeBSD.org>

_curpcb is always defined in globals.s instead of here in #ifdefs


# 8a6472b7 28-Mar-1998 Peter Dufault <dufault@FreeBSD.org>

Finish _POSIX_PRIORITY_SCHEDULING. Needs P1003_1B and
_KPOSIX_PRIORITY_SCHEDULING options to work. Changes:

Change all "posix4" to "p1003_1b". Misnamed files are left
as "posix4" until I'm told if I can simply delete them and add
new ones;

Add _POSIX_PRIORITY_SCHEDULING system calls for FreeBSD and Linux;

Add man pages for _POSIX_PRIORITY_SCHEDULING system calls;

Add options to LINT;

Minor fixes to P1003_1B code during testing.


# f3df61a1 04-Mar-1998 Peter Dufault <dufault@FreeBSD.org>

Reviewed by: msmith, bde long ago
Fix for RTPRIO scheduler to eliminate invalid context switches.


# 0b08f5f7 05-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Back out DIAGNOSTIC changes.


# 47cfdb16 04-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Turn DIAGNOSTIC into a new-style option.


# 5c623cb6 14-Dec-1997 Tor Egge <tegge@FreeBSD.org>

Add support for low resolution SMP kernel profiling.

- A nonprofiling version of s_lock (called s_lock_np) is used
by mcount.

- When profiling is active, more registers are clobbered in
seemingly simple assembly routines. This means that some
callers needed to save/restore extra registers.

- The stack pointer must have space for a 'fake' return address
in idle, to avoid stack underflow.


# 82566551 13-Dec-1997 John Dyson <dyson@FreeBSD.org>

After one of my analysis passes to evaluate methods for SMP TLB mgmt, I
noticed some major enhancements available for UP situations. The number
of UP TLB flushes is decreased much more than significantly with these
changes. Since a TLB flush appears to cost minimally approx 80 cycles,
this is a "nice" enhancement, equiv to eliminating between 40 and 160
instructions per TLB flush.

Changes include making sure that kernel threads all use the same PTD,
and eliminate unneeded PTD switches at context switch time.


# 98823b23 10-Oct-1997 Peter Wemm <peter@FreeBSD.org>

Convert the VM86 option from a global option to an option only depended
on by the files that use it. Changing the VM86 option now only causes
a recompile of a dozen files or so rather than the entire kernel.


# dfd5aef3 21-Sep-1997 Peter Wemm <peter@FreeBSD.org>

Implement the parts needed for VM86 under SMP.


# 20233f27 07-Sep-1997 Steve Passe <fsmp@FreeBSD.org>

General cleanup of the lock pushdown code. They are grouped and enabled
from machine/smptests.h:

#define PUSHDOWN_LEVEL_1
#define PUSHDOWN_LEVEL_2
#define PUSHDOWN_LEVEL_3
#define PUSHDOWN_LEVEL_4_NOT


# bb36094c 05-Sep-1997 Peter Wemm <peter@FreeBSD.org>

Argh, what was I thinking?? Don't (yet) halt the CPU in the idle loop
while waiting for an interrupt (rather than spinning on the runqueue status
bits), since the other cpu can put stuff in there and the sleeping cpu may
not get an interrupt for a while. When we have a reschedule IPI, this can
come back.

Pointed out by: fsmp


# 9a3b3e8b 26-Aug-1997 Peter Wemm <peter@FreeBSD.org>

Clean up the SMP AP bootstrap and eliminate the wretched idle procs.

- We now have enough per-cpu idle context, the real idle loop has been
revived (cpu's halt now with nothing to do).
- Some preliminary support for running some operations outside the
global lock (eg: zeroing "free but not yet zeroed pages") is present
but appears to cause problems. Off by default.
- the smp_active sysctl now behaves differently. It's merely a 'true/false'
option. Setting smp_active to zero causes the AP's to halt in the idle
loop and stop scheduling processes.
- bootstrap is a lot safer. Instead of sharing a statically compiled in
stack a number of times (which has caused lots of problems) and then
abandoning it, we use the idle context to boot the AP's directly. This
should help >2 cpu support since the bootlock stuff was in doubt.
- print physical apic id in traps.. helps identify private pages getting
out of sync. (You don't want to know how much hair I tore out with this!)

More cleanup to follow, this is more of a checkpoint than a
'finished' thing.


# 48a09cf2 08-Aug-1997 John Dyson <dyson@FreeBSD.org>

VM86 kernel support.
Work done by BSDI, Jonathan Lemon <jlemon@americantv.com>,
Mike Smith <msmith@gsoft.com.au>, Sean Eric Fagan <sef@kithrup.com>,
and probably alot of others.
Submitted by: Jnathan Lemon <jlemon@americantv.com>


# 570dbb53 04-Aug-1997 Steve Passe <fsmp@FreeBSD.org>

Eliminate frequent silo overflows by restoring the TEST_LOPRIO code.
This code was eliminated when the PEND_INTS algorithm was added. But it was
discovered that PEND_INTS only worsen latency for FAST_INTR() routines,
which can't be marked pending.

Noticed & debugged by: dave adkins <adkin003@gold.tc.umn.edu>


# da9f0182 30-Jul-1997 Steve Passe <fsmp@FreeBSD.org>

Converted the TEST_LOPRIO code to default.
Created mplock functions that save/restore NO registers.
Minor cleanup.


# e31521c3 20-Jul-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes.


# 665bb8fa 14-Jul-1997 Steve Passe <fsmp@FreeBSD.org>

Tighten up asm code for TEST_PRIO and other misc. things.
Use some new defines in place of "magic numbers".


# 057b294d 30-Jun-1997 Bruce Evans <bde@FreeBSD.org>

Un-inline a call to spl0(). It is not time critical, and was only inline
because there was no non-inline spl0() to call.

Don't frob intr_nesting_level in idle() or cpu_switch(). Interrupts
are mostly disabled then, so the frobbing had little effect.


# b3196e4b 22-Jun-1997 Peter Wemm <peter@FreeBSD.org>

Preliminary support for per-cpu data pages.

This eliminates a lot of #ifdef SMP type code. Things like _curproc reside
in a data page that is unique on each cpu, eliminating the expensive macros
like: #define curproc (SMPcurproc[cpunumber()])

There are some unresolved bootstrap and address space sharing issues at
present, but Steve is waiting on this for other work. There is still some
strictly temporary code present that isn't exactly pretty.

This is part of a larger change that has run into some bumps, this part is
standalone so it should be safe. The temporary code goes away when the
full idle cpu support is finished.

Reviewed by: fsmp, dyson


# 7b3c8424 06-Jun-1997 Bruce Evans <bde@FreeBSD.org>

Preserve %fs and %gs across context switches. This has a relatively low
cost since it is only done in cpu_switch(), not for every exception.
The extra state is kept in the pcb, and handled much like the npx state,
with similar deficiencies (the state is not preserved across signal
handlers, and error handling loses state).


# 5400ed3b 31-May-1997 Peter Wemm <peter@FreeBSD.org>

Include file updates.. <machine/spl.h> -> <machine/ipl.h>, add
<machine/ipl.h> to those files that were depending on getting SWI_*
implicitly via <machine/cpufunc.h>


# 288e2230 28-May-1997 Peter Wemm <peter@FreeBSD.org>

remove no longer needed opt_smp.h includes


# 7b2a188c 28-Apr-1997 Steve Passe <fsmp@FreeBSD.org>

cleaned out an old FIXME.


# 477a642c 26-Apr-1997 Peter Wemm <peter@FreeBSD.org>

Man the liferafts! Here comes the long awaited SMP -> -current merge!

There are various options documented in i386/conf/LINT, there is more to
come over the next few days.

The kernel should run pretty much "as before" without the options to
activate SMP mode.

There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.

This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!


# 9081eec1 22-Apr-1997 John Polstra <jdp@FreeBSD.org>

Make the necessary changes so that an ELF kernel can be built. I
have successfully built, booted, and run a number of different ELF
kernel configurations, including GENERIC. LINT also builds and
links cleanly, though I have not tried to boot it.

The impact on developers is virtually nil, except for two things.
All linker sets that might possibly be present in the kernel must be
listed in "sys/i386/i386/setdefs.h". And all C symbols that are
also referenced from assembly language code must be listed in
"sys/i386/include/asnames.h". It so happens that failure to do
these things will have no impact on the a.out kernel. But it will
break the build of the ELF kernel.

The ELF bootloader works, but it is not ready to commit quite yet.


# a688c7b0 20-Apr-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Fix up the "hlt vector" change I made.
Reviewed by: bde, bde, bde


# 3845d118 14-Apr-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Forget all about APM. Instead of "hlt" call through a vector which
APM can then fiddle with. Default for the vector is to "htl; ret"


# a2a1c95c 07-Apr-1997 Peter Wemm <peter@FreeBSD.org>

The biggie: Get rid of the UPAGES from the top of the per-process address
space. (!)

Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.

Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.

The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).

Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().

The following parts came from John Dyson:

Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.

Move the upages object (upobj) from the vmspace to the proc structure.

Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# ebd707d3 16-Oct-1996 Bruce Evans <bde@FreeBSD.org>

Fixed miscounting for non-statistical (GUPROF) profiling:
- use CROSSJUMP() and CROSSJUMP_LABEL() for conditional jumps from idle()
into cpu_switch() and vice versa.
- moved badsw code to after cpu_switch().

Cosmetic changes:
- moved sw0 string to be immediately after its caller (badsw).
- removed unused #include.


# e8993539 19-Sep-1996 Poul-Henning Kamp <phk@FreeBSD.org>

Add APM_IDLE_CPU option, that is off by default.
I maintain that it saves more power to simply "hlt" the CPU than to
spend tons of time trying to tell the APM bios to do the same.
In particular if you do it 100 times a second...


# 85acc688 30-Jul-1996 Bruce Evans <bde@FreeBSD.org>

Eliminated pcb_inl. It was always 0 because context switches don't occur
in interrupt handlers.


# b1508c72 31-Jul-1996 David Greenman <dg@FreeBSD.org>

Converted timer/run queues to 4.4BSD queue style. Removed old and unused
sleep(). Implemented wakeup_one() which may be used in the future to combat
the "thundering herd" problem for some special cases.

Reviewed by: dyson


# 79df6d85 25-Jun-1996 Bruce Evans <bde@FreeBSD.org>

trap.c:
Fixed profiling of system times. It was pre-4.4Lite and didn't support
statclocks. System times were too small by a factor of 8.

Handle deferred profiling ticks the 4.4Lite way: use addupc_task() instead
of addupc(). Call addupc_task() directly instead of using the ADDUPC()
macro.

Removed vestigial support for PROFTIMER.

switch.s:
Removed addupc().

resourcevar.h:
Removed ADDUPC() and declarations of addupc().

cpu.h:
Updated a comment. i386's never were tahoe's, and the deferred profiling
tick became (possibly) multiple ticks in 4.4Lite.

Obtained from: mostly from NetBSD


# 93f4b1bf 25-Jun-1996 Bruce Evans <bde@FreeBSD.org>

Save John Polstra's initial fix for profiling for reference. The
multiplication in addupc() overflowed for addresses >= 256K, assuming
the usual profil(2) scale parameter of 0x8000. addupc() will go away
soon.

Submitted by: John Polstra <jdp@polstra.com>


# eabe0f9f 30-Apr-1996 Bruce Evans <bde@FreeBSD.org>

Don't return unused values in cpu_switch() or savectx().

Don't preserve unused registers in the NPX case in savectx().


# 68832d30 25-Apr-1996 Poul-Henning Kamp <phk@FreeBSD.org>

Fix cpu_fork for real.

Suggested by: bde


# 7d239214 18-Apr-1996 Poul-Henning Kamp <phk@FreeBSD.org>

Fix a bogon. cpu_fork & savectx ecpected cpu_switch to restore %eax,
they shouldn't.


# 0513ce7f 13-Apr-1996 Bruce Evans <bde@FreeBSD.org>

Use PCB_SAVEFPU_SIZE instead of a too-small size in savectx(). This
bug only affected FPU emulators. It might have caused bogus FPU states
in core dumps and in the child pcb after a fork. Emulated FPU states
in core dumps don't work for other reasons, and the child FPU state
is reinitialized by exec, so the problem might not have caused any
noticeable affects.

Cleaned up #includes.


# 8371872e 18-Mar-1996 Nate Williams <nate@FreeBSD.org>

Always enable interrupts before calling the APM idle/busy routines.

Suggested by: phk@FreeBSD.org


# 44f0e01b 11-Mar-1996 Nate Williams <nate@FreeBSD.org>

Removed undocumented an unused APM_SLOWSTART code.


# 267173e7 04-Feb-1996 David Greenman <dg@FreeBSD.org>

Rewrote cpu_fork so that it doesn't use pmap_activate, and removed
pmap_activate since it's not used anymore. Changed cpu_fork so that
it uses one line of inline assembly rather than calling mvesp() to
get the current stack pointer. Removed mvesp() since it is no longer
being used.


# ac474627 02-Feb-1996 David Greenman <dg@FreeBSD.org>

Killed last change - it was bogus. cpu_switch() already assumes that
return address is on the stack.


# b09fb643 29-Jan-1996 David Greenman <dg@FreeBSD.org>

savectx() strikes again: the saved stack pointer wasn't properly adjusted
to remove the return address. It's only the frame pointer and luck that
allowed the code to work at all.


# 2924d491 22-Jan-1996 David Greenman <dg@FreeBSD.org>

Simplified savectx() a little and fixed a bug that caused it to return
garbage in the child process rather than "1" like it is supposed to.

Reviewed by: bde


# db6a20e2 03-Jan-1996 Garrett Wollman <wollman@FreeBSD.org>

Converted two options over to the new scheme: USER_LDT and KTRACE.


# 32831552 21-Dec-1995 David Greenman <dg@FreeBSD.org>

Rewrote most of the ddb stack traceback code. These changes are smarter
about decoding trap/syscall/interrupt frames and generally works better
than the previous stuff.
Removed some special (incorrect) frobbing of the frame pointer that
was messing some things up with the new traceback code.


# 87b91157 10-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Staticize and cleanup.
remove a TON of #includes from machdep.


# 7dfe504f 09-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Remove various unused symbols and procedures.


# a29b63cb 03-Sep-1995 John Dyson <dyson@FreeBSD.org>

Machine dependent routines to support pre-zeroed free pages. This
significantly improves demand zero performance.


# 648c711b 16-Feb-1995 Poul-Henning Kamp <phk@FreeBSD.org>

This is the latest version of the APM stuff from HOSOKAWA, I have looked
briefly over it, and see some serious architectural issues in this stuff.

On the other hand, I doubt that we will have any solution to these issues
before 2.1, so we might as well leave this in.

Most of the stuff is bracketed by #ifdef's so it shouldn't matter too much
in the normal case.

Reviewed by: phk
Submitted by: HOSOKAWA, Tatsumi <hosokawa@mt.cs.keio.ac.jp>


# e6891db8 21-Jan-1995 Bruce Evans <bde@FreeBSD.org>

Don't count context switches here, they are already counted in mi_switch().


# b39b673d 03-Dec-1994 Bruce Evans <bde@FreeBSD.org>

i386/exception.s,
Keep track of interrupt nesting level. It is normally 0
for syscalls and traps, but is fudged to 1 for their exit
processing in case they metamorphose into an interrupt
handler.

i386/genassym.c;
Remove support for the obsolete pcb_iml and pcb_cmap2.

Add support for pcb_inl.

i386/swtch.s:
Fudge the interrupt nesting level across context switches and in
the idle loop so that the work for preemptive context switches
gets counted as interrupt time, the work for voluntary context
switches gets counted mostly as system time (the part when
curproc == 0 gets counted as interrupt time), and only truly idle
time gets counted as idle time.

Remove obsolete support (commented out and otherwise) for pcb_iml.

Load curpcb just before curproc instead of just after so that
curpcb is always valid if curproc is. A few more changes like
this may fix tracing through context switches.

Remove obsolete function swtch_to_inactive().

include/cpu.h:
Use the new interrupt nesting level variable to implement a
non-fake CLF_INTR() so that accounting for the interrupt state
works.

You can use top, iostat or (best) an up to date systat to see
interrupt overheads. I see the expected huge interrupt overheads
for ISA devices (on a 486DX/33, about 55% for an IDE drive
transferring 1250K/sec and the same for a WD8013EBT network card
transferring 1100K/sec). The huge interrupt overheads for serial
devices are unfortunately normally invisible.

include/pcb.h:
Remove the obsolete pcb_iml and pcb_cmap2. Replace them by
padding to preserve binary compatibility.

Use part of the new padding for pcb_inl.

isa/icu.s:
isa/vector.s:
Keep track of interrupt nesting level.


# 54d02404 30-Oct-1994 Bruce Evans <bde@FreeBSD.org>

locore.s:
Build a dummy frame at the top of tmpstk to help debuggers trace the stack
when the system is idle.

swtch.s: idle():
Initialize the frame pointer so that debuggers don't try to trace a bogus
stack.

Load the frame pointer, load the stack pointer and switch out the old
stack in the unique order that never leaves one of the pointers pointers
invalid so that debuggers can trace idle(). Disabling interrupts
provides sufficient validity for normal operation, but debuggers use
(trace) traps.


# a0181c75 25-Oct-1994 David Greenman <dg@FreeBSD.org>

Moved initialization of tmpstk so that it immediately follows the kernel
text. Fixed rounding bug that caused the last page of kernel text to be
read/write instead of read-only. This is important now that tmpstk can
crash into it. Removed +4 bias of tmpstk because it screws up ddb's
ability to traceback correctly.


# 7216391e 01-Oct-1994 David Greenman <dg@FreeBSD.org>

"idle priority" support. Based on code from Henrik Vestergaard Draboel,
but substantially rewritten by me.


# 22414e53 30-Sep-1994 David Greenman <dg@FreeBSD.org>

Laptop Advanced Power Management support by HOSOKAWA Tatsumi.

Submitted by: HOSOKAWA Tatsumi


# 35aee080 01-Sep-1994 David Greenman <dg@FreeBSD.org>

Converted P_LINK -> P_FORW, P_RLINK -> P_BACK, minor optimization.


# e8fb0b2c 31-Aug-1994 David Greenman <dg@FreeBSD.org>

Realtime priority scheduling support.

Submitted by: Henrik Vestergaard Draboel


# 8fdce837 30-Aug-1994 Bruce Evans <bde@FreeBSD.org>

Don't define LOCORE (as nothing) in sources. It is now defined
consistently (as 1) in Makefile.i386 for all assembler sources.


# 27774cbb 19-Aug-1994 David Greenman <dg@FreeBSD.org>

Removed bogus save of CMAP2.


# ed3f8954 06-Aug-1994 David Greenman <dg@FreeBSD.org>

Made the tmpstk start at tmpstk. Not doing so causes problems for the
debugger.

Submitted by: John Dyson


# d23d07ef 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Merged in post-1.1.5 work done by myself and John Dyson. This includes:

me:
1) TLB flush optimization that effectively eliminates half of all of the
TLB flushes. This works by only flushing the TLB when a page is "present"
in memory (i.e. the valid bit is set in the page table entry). See section
5.3.5 of the Intel 386 Programmer's Reference Manual.
2) The handling of "CMAP" has been improved to catch attempts at multiple
simultaneous use.

John:
1) Added pmap_qenter/pmap_qremove functions for fast mapping of pages into
the kernel. This is for future optimizations and support for the upcoming
merged VM/buffer cache.

Reviewed by: John Dyson


# 26f9a767 25-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 0e195446 20-Apr-1994 David Greenman <dg@FreeBSD.org>

Bug fixes and performance improvements from John Dyson and myself:

1) check va before clearing the page clean flag. Not doing so was
causing the vnode pager error 5 messages when paging from
NFS. (pmap.c)
2) put back interrupt protection in idle_loop. Bruce didn't think
it was necessary, John insists that it is (and I agree). (swtch.s)
3) various improvements to the clustering code (vm_machdep.c). It's
now enabled/used by default.
4) bad disk blocks are now handled properly when doing clustered IOs.
(wd.c, vm_machdep.c)
5) bogus bad block handling fixed in wd.c.
6) algorithm improvements to the pageout/pagescan daemons. It's amazing
how well 4MB machines work now.


# d2306226 02-Apr-1994 David Greenman <dg@FreeBSD.org>

New interrupt code from Bruce Evans. In additional to Bruce's attached
list of changes, I've made the following additional changes:

1) i386/include/ipl.h renamed to spl.h as the name conflicts with the
file of the same name in i386/isa/ipl.h.
2) changed all use of *mask (i.e. netmask, biomask, ttymask, etc) to
*_imask (net_imask, etc).
3) changed vestige of splnet use in if_is to splimp.
4) got rid of "impmask" completely (Bruce had gotten rid of netmask),
and are now using net_imask instead.
5) dozens of minor cruft to glue in Bruce's changes.

These require changes I made to config(8) as well, and thus it must
be rebuilt.

-DG

from Bruce Evans:

sio:
o No diff is supplied. Remove the define of setsofttty(). I hope
that is enough.

*.s:
o i386/isa/debug.h no longer exists. The event counters became too
much trouble to maintain. All function call entry and exception
entry counters can be recovered by using profiling kernel (the new
profiling supports all entry points; however, it is too slow to
leave enabled all the time; it also). Only BDBTRAP() from debug.h
is now used. That is moved to exception.s. It might be worth
preserving SHOW_BITS() and calling it from _mcount() (if enabled).
o T_ASTFLT is now only set just before calling trap().
o All exception handlers set SWI_AST_MASK in cpl as soon as possible
after entry and arrange for _doreti to restore it atomically with
exiting. It is not possible to set it atomically with entering
the kernel, so it must be checked against the user mode bits in
the trap frame before committing to using it. There is no place
to store the old value of cpl for syscalls or traps, so there are
some complications restoring it.

Profiling stuff (mostly in *.s):
o Changes to kern/subr_mcount.c, gcc and gprof are not supplied yet.
o All interesting labels `foo' are renamed `_foo' and all
uninteresting labels `_bar' are renamed `bar'. A small change
to gprof allows ignoring labels not starting with underscores.
o MCOUNT_LABEL() is to provide names for counters for times spent
in exception handlers.
o FAKE_MCOUNT() is a version of MCOUNT() suitable for exception
handlers. Its arg is the pc where the exception occurred. The
new mcount() pretends that this was a call from that pc to a
suitable MCOUNT_LABEL().
o MEXITCOUNT is to turn off any timer started by MCOUNT().

/usr/src/sys/i386/i386/exception.s:
o The non-BDB BPTTRAP() macros were doing a sti even when interrupts
were disabled when the trap occurred. The sti (fixed) sti is
actually a no-op unless you have my changes to machdep.c that make
the debugger trap gates interrupt gates, but fixing that would
make the ifdefs messier. ddb seems to be unharmed by both
interrupts always disabled and always enabled (I had the branch in
the fix back to front for some time :-().
o There is no known pushal bug.
o tf_err can be left as garbage for syscalls.

/usr/src/sys/i386/i386/locore.s:
o Fix and update BDE_DEBUGGER support.
o ENTRY(btext) before initialization was dangerous.
o Warm boot shot was longer than intended.

/usr/src/sys/i386/i386/machdep.c:
o DON'T APPLY ALL OF THIS DIFF. It's what I'm using, but may require
other changes.
Use the following:
o Remove aston() and setsoftclock().
Maybe use the following:
o No netisr.h.
o Spelling fix.
o Delay to read the Rebooting message.
o Fix for vm system unmapping a reduced area of memory
after bounds_check_with_label() reduces the size of
a physical i/o for a partition boundary. A similar
fix is required in kern_physio.c.
o Correct use of __CONCAT. It never worked here for non-
ANSI cpp's. Is it time to drop support for non-ANSI?
o gdt_segs init. 0xffffffffUL is bogus because ssd_limit
is not 32 bits. The replacement may have the same
value :-), but is more natural.
o physmem was one page too low. Confusing variable names.
Don't use the following:
o Better numbers of buffers. Each 8K page requires up to
16 buffer headers. On my system, this results in 5576
buffers containing [up to] 2854912 bytes of memory.
The usual allocation of about 384 buffers only holds
192K of disk if you use it on an fs with a block size
of 512.
o gdt changes for bdb.
o *TGT -> *IDT changes for bdb.
o #ifdefed changes for bdb.

/usr/src/sys/i386/i386/microtime.s:
o Use the correct asm macros. I think asm.h was copied from Mach
just for microtime and isn't used now. It certainly doesn't
belong in <sys>. Various macros are also duplicated in
sys/i386/boot.h and libc/i386/*.h.
o Don't switch to and from the IRR; it is guaranteed to be selected
(default after ICU init and explicitly selected in isa.c too, and
never changed until the old microtime clobbered it).

/usr/src/sys/i386/i386/support.s:
o Non-essential changes (none related to spls or profiling).
o Removed slow loads of %gs again. The LDT support may require
not relying on %gs, but loading it is not the way to fix it!
Some places (copyin ...) forgot to load it. Loading it clobbers
the user %gs. trap() still loads it after certain types of
faults so that fuword() etc can rely on it without loading it
explicitly. Exception handlers don't restore it. If we want
to preserve the user %gs, then the fastest method is to not
touch it except for context switches. Comparing with
VM_MAXUSER_ADDRESS and branching takes only 2 or 4 cycles on
a 486, while loading %gs takes 9 cycles and using it takes
another.
o Fixed a signed branch to unsigned.

/usr/src/sys/i386/i386/swtch.s:
o Move spl0() outside of idle loop.
o Remove cli/sti from idle loop. sw1 does a cli, and in the
unlikely event of an interrupt occurring and whichqs becoming
zero, sw1 will just jump back to _idle.
o There's no spl0() function in asm any more, so use splz().
o swtch() doesn't need to be superaligned, at least with the
new mcounting.
o Fixed a signed branch to unsigned.
o Removed astoff().

/usr/src/sys/i386/i386/trap.c:
o The decentralized extern decls were inconsistent, of course.
o Fixed typo MATH_EMULTATE in comments. */
o Removed unused variables.
o Old netmask is now impmask; print it instead. Perhaps we
should print some of the new masks.
o BTW, trap() should not print anything for normal debugger
traps.

/usr/src/sys/i386/include/asmacros.h:
o DON'T APPLY ALL OF THIS DIFF. Just use some of the null macros
as necessary.

/usr/src/sys/i386/include/cpu.h:
o CLKF_BASEPRI() changes since cpl == SWI_AST_MASK is now normal
while the kernel is running.
o Don't use var++ to set boolean variables. It fails after a mere
4G times :-) and is slower than storing a constant on [3-4]86s.

/usr/src/sys/i386/include/cpufunc.h:
o DON'T APPLY ALL OF THIS DIFF. You need mainly the include of
<machine/ipl.h>. Unfortunately, <machine/ipl.h> is needed by
almost everything for the inlines.

/usr/src/sys/i386/include/ipl.h:
o New file. Defines spl inlines and SWI macros and declares most
variables related to hard and soft interrupt masks.

/usr/src/sys/i386/isa/icu.h:
o Moved definitions to <machine/ipl.h>

/usr/src/sys/i386/isa/icu.s:
o Software interrupts (SWIs) and delayed hardware interrupts (HWIs)
are now handled uniformally, and dispatching them from splx() is
more like dispatching them from _doreti. The dispatcher is
essentially *(handler[ffs(ipending & ~cpl)]().
o More care (not quite enough) is taken to avoid unbounded nesting
of interrupts.
o The interface to softclock() is changed so that a trap frame is
not required.
o Fast interrupt handlers are now handled more uniformally.
Configuration is still too early (new handlers would require
bits in <machine/ipl.h> and functions to vector.s).
o splnnn() and splx() are no longer here; they are inline functions
(could be macros for other compilers). splz() is the nontrivial
part of the old splx().

/usr/src/sys/i386/isa/ipl.h
o New file. Supposed to have only bus-dependent stuff. Perhaps
the h/w masks should be declared here.

/usr/src/sys/i386/isa/isa.c:
o DON'T APPLY ALL OF THIS DIFF. You need only things involving
*mask and *MASK and comments about them. netmask is now a pure
software mask. It works like the softclock mask.

/usr/src/sys/i386/isa/vector.s:
o Reorganize AUTO_EOI* macros.
o Option FAST_INTR_HANDLER_USERS_ES for people who don't trust
fastintr handlers.
o fastintr handlers need to metamorphose into ordinary interrupt
handlers if their SWI bit has become set. Previously, sio had
unintended latency for handling output completions and input
of SLIP framing characters because this was not done.

/usr/src/sys/net/netisr.h:
o The machine-dependent stuff is now imported from <machine/ipl.h>.

/usr/src/sys/sys/systm.h
o DON'T APPLY ALL OF THIS DIFF. You need mainly the different
splx() prototype. The spl*() prototypes are duplicated as
inlines in <machine/ipl.h> but they need to be duplicated here
in case there are no inlines. I sent systm.h and cpufunc.h
to Garrett. We agree that spl0 should be replaced by splnone
and not the other way around like I've done.

/usr/src/sys/kern/kern_clock.c
o splsoftclock() now lowers cpl so the direct call to softclock()
works as intended.
o softclock() interface changed to avoid passing the whole frame
(some machines may need another change for profile_tick()).
o profiling renamed _profiling to avoid ANSI namespace pollution.
(I had to improve the mcount() interface and may as well fix it.)
The GUPROF variant doesn't actually reference profiling here,
but the 'U' in GUPROF should mean to select the microtimer
mcount() and not change the interface.


# da59a31c 31-Jan-1994 David Greenman <dg@FreeBSD.org>

WINE/user LDT support from John Brezak, ported to FreeBSD by Jeffrey Hsu
<hsu@soda.berkeley.edu>.


# d64f660f 17-Jan-1994 David Greenman <dg@FreeBSD.org>

Improvements mostly from John Dyson, with a little bit from me.

* Removed pmap_is_wired
* added extra cli/sti protection in idle (swtch.s)
* slight code improvement in trap.c
* added lots of comments
* improved paging and other algorithms in VM system


# 7f8cb368 14-Jan-1994 David Greenman <dg@FreeBSD.org>

"New" VM system from John Dyson & myself. For a run-down of the
major changes, see the log of any effected file in the sys/vm
directory (swap_pager.c for instance).


# 0967373e 12-Nov-1993 David Greenman <dg@FreeBSD.org>

First steps in rewriting locore.s, and making info useful
when the machine panics.

i386/i386/locore.s:
1) got rid of most .set directives that were being used like
#define's, and replaced them with appropriate #define's in
the appropriate header files (accessed via genassym).
2) added comments to header inclusions and global definitions,
and global variables
3) replaced some hardcoded constants with cpp defines (such as
PDESIZE and others)
4) aligned all comments to the same column to make them easier to
read
5) moved macro definitions for ENTRY, ALIGN, NOP, etc. to
/sys/i386/include/asmacros.h
6) added #ifdef BDE_DEBUGGER around all of Bruce's debugger code
7) added new global '_KERNend' to store last location+1 of kernel
8) cleaned up zeroing of bss so that only bss is zeroed
9) fix zeroing of page tables so that it really does zero them all
- not just if they follow the bss.
10) rewrote page table initialization code so that 1) works correctly
and 2) write protects the kernel text by default
11) properly initialize the kernel page directory, upages, p0stack PT,
and page tables. The previous scheme was more than a bit
screwy.
12) change allocation of virtual area of IO hole so that it is
fixed at KERNBASE + 0xa0000. The previous scheme put it
right after the kernel page tables and then later expected
it to be at KERNBASE +0xa0000
13) change multiple bogus settings of user read/write of various
areas of kernel VM - including the IO hole; we should never
be accessing the IO hole in user mode through the kernel
page tables
14) split kernel support routines such as bcopy, bzero, copyin,
copyout, etc. into a seperate file 'support.s'
15) split swtch and related routines into a seperate 'swtch.s'
16) split routines related to traps, syscalls, and interrupts
into a seperate file 'exception.s'
17) remove some unused global variables from locore that got
inserted by Garrett when he pulled them out of some .h
files.

i386/isa/icu.s:
1) clean up global variable declarations
2) move in declaration of astpending and netisr

i386/i386/pmap.c:
1) fix calculation of virtual_avail. It previously was calculated
to be right in the middle of the kernel page tables - not
a good place to start allocating kernel VM.
2) properly allocate kernel page dir/tables etc out of kernel map
- previously only took out 2 pages.

i386/i386/machdep.c:
1) modify boot() to print a warning that the system will reboot in
PANIC_REBOOT_WAIT_TIME amount of seconds, and let the user
abort with a key on the console. The machine will wait for
ever if a key is typed before the reboot. The default is
15 seconds, but can be set to 0 to mean don't wait at all,
-1 to mean wait forever, or any positive value to wait for
that many seconds.
2) print "Rebooting..." just before doing it.

kern/subr_prf.c:
1) remove PANICWAIT as it is deprecated by the change to machdep.c

i386/i386/trap.c:
1) add table of trap type strings and use it to print a real trap/
panic message rather than just a number. Lot's of work to
be done here, but this is the first step. Symbolic traceback
is in the TODO.

i386/i386/Makefile.i386:
1) add support in to build support.s, exception.s and swtch.s

...and various changes to various header files to make all of the
above happen.