History log of /freebsd-10.1-release/sys/amd64/amd64/genassym.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 272461 02-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 271999 22-Sep-2014 jhb

MFC 270850,271053,271192,271717:
Save and restore FPU state across suspend and resume on i386.
- Create a separate structure for per-CPU state saved across suspend and
resume that is a superset of a pcb.
- Store the FPU state for suspend and resume in the new structure
(for amd64, this moves it out of the PCB)
- On both i386 and amd64, all of the FPU suspend/resume handling is now
done in C.

Approved by: re (hrs)


# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 255217 04-Sep-2013 kib

Tidy up some loose ends in the PCID code:

- Restore the pre-PCID TLB shootdown handlers for whole address space
and single page invalidation asm code, and assign the IPI handler to
them when PCID is not supported or disabled. Old handlers have
linear control flow. But, still use the common return sequence.

- Stop using pcpu for INVPCID descriptors in the invlrg handler. It
is enough to allocate descriptors on the stack. As result, two
SWAPGS instructions are shaved off from the code for Haswell+.

- Fix the reverted condition in invlrng for checking of the PCID
support [1], also in invlrng check that pmap is kernel pmap before
performing other tests. For the kernel pmap, which provides global
mappings, the INVLPG must be used for invalidation always.

- Save the pre-computed pmap' %CR3 register in the struct pmap. This
allows to remove several checks for pm_pcid validity when %CR3 is
reloaded [2].

Noted by: gibbs [1]
Discussed with: alc [2]
Tested by: pho, flo
Sponsored by: The FreeBSD Foundation


# 255060 30-Aug-2013 kib

Implement support for the process-context identifiers ('PCID') on
Intel CPUs. The feature tags TLB entries with the Id of the address
space and allows to avoid TLB invalidation on the context switch, it
is available only in the long mode. In the microbenchmarks, using the
PCID decreased latency of the context switches by ~30% on SandyBridge
class desktop CPUs, measured with the lat_ctx program from lmbench.

If available, use INVPCID instruction when a TLB entry in non-current
address space needs to be invalidated. The instruction is typically
available on the Haswell.

If needed, the use of PCID can be turned off with the
vm.pmap.pcid_enabled loader tunable set to 0. The state of the
feature is reported by the vm.pmap.pcid_enabled sysctl. The sysctl
vm.pmap.pcid_save_cnt reports the number of context switches which
avoided invalidating the TLB; compare with the total number of context
switches, available as sysctl vm.stats.sys.v_swtch.

Sponsored by: The FreeBSD Foundation
Reviewed by: alc
Tested by: pho, bf


# 251720 13-Jun-2013 neel

Remove unused macros PTESHIFT, PDESHIFT, PDPESHIFT and PML4ESHIFT.

Reviewed by: alc


# 250423 09-May-2013 dchagin

Retire write-only PCB_GS32BIT pcb flag on amd64.


# 236772 08-Jun-2012 iwasaki

Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of
suspend/resume procedures are minimized among them.

common:
- Add global cpuset suspended_cpus to indicate APs are suspended/resumed.
- Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used).
- Add some variables in acpi_wakecode.S in order to minimize the difference
among amd64 and i386.
- Disable load_cr3() because now CR3 is restored in resumectx().

amd64:
- Add suspend/resume related members (such as MSR) in PCB.
- Modify savectx() for above new PCB members.
- Merge acpi_switch.S into cpu_switch.S as resumectx().

i386:
- Merge(and remove) suspendctx() into savectx() in order to match with
amd64 code.

Reviewed by: attilio@, acpi@


# 230426 21-Jan-2012 kib

Add support for the extended FPU states on amd64, both for native
64bit and 32bit ABIs. As a side-effect, it enables AVX on capable
CPUs.

In particular:

- Query the CPU support for XSAVE, list of the supported extensions
and the required size of FPU save area. The hw.use_xsave tunable is
provided for disabling XSAVE, and hw.xsave_mask may be used to
select the enabled extensions.

- Remove the FPU save area from PCB and dynamically allocate the
(run-time sized) user save area on the top of the kernel stack,
right above the PCB. Reorganize the thread0 PCB initialization to
postpone it after BSP is queried for save area size.

- The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as
well. FPU state is only useful for suspend, where it is saved in
dynamically allocated suspfpusave area.

- Use XSAVE and XRSTOR to save/restore FPU state, if supported and
enabled.

- Define new mcontext_t flag _MC_HASFPXSTATE, indicating that
mcontext_t has a valid pointer to out-of-struct extended FPU
state. Signal handlers are supplied with stack-allocated fpu
state. The sigreturn(2) and setcontext(2) syscall honour the flag,
allowing the signal handlers to inspect and manipilate extended
state in the interrupted context.

- The getcontext(2) never returns extended state, since there is no
place in the fixed-sized mcontext_t to place variable-sized save
area. And, since mcontext_t is embedded into ucontext_t, makes it
impossible to fix in a reasonable way. Instead of extending
getcontext(2) syscall, provide a sysarch(2) facility to query
extended FPU state.

- Add ptrace(2) support for getting and setting extended state; while
there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries.

- Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to
consumers, making it opaque. Internally, struct fpu_kern_ctx now
contains a space for the extended state. Convert in-kernel consumers
of fpu_kern KPI both on i386 and amd64.

First version of the support for AVX was submitted by Tim Bird
<tim.bird am sony com> on behalf of Sony. This version was written
from scratch.

Tested by: pho (previous version), Yamagi Burmeister <lists yamagi org>
MFC after: 1 month


# 225475 11-Sep-2011 kib

Perform amd64-specific microoptimizations for native syscall entry
sequence. The effect is ~1% on the microbenchmark.

In particular, do not restore registers which are preserved by the
C calling sequence. Align the jump target. Avoid unneeded memory
accesses by calculating some data in syscall entry trampoline.

Reviewed by: jhb
Approved by: re (bz)
MFC after: 2 weeks


# 224187 18-Jul-2011 attilio

- Remove the eintrcnt/eintrnames usage and introduce the concept of
sintrcnt/sintrnames which are symbols containing the size of the 2
tables.
- For amd64/i386 remove the storage of intr* stuff from assembly files.
This area can be widely improved by applying the same to other
architectures and likely finding an unified approach among them and
move the whole code to be MI. More work in this area is expected to
happen fairly soon.

No MFC is previewed for this patch.

Tested by: pluknet
Reviewed by: jhb
Approved by: re (kib)


# 221032 25-Apr-2011 rmacklem

Fix the experimental NFS client so that it does not bogusly
set the f_flags field of "struct statfs". This had the interesting
effect of making the NFSv4 mounts "disappear" after r221014,
since NFSMNT_NFSV4 and MNT_IGNORE became the same bit.
Move the files used for a diskless NFS root from sys/nfsclient
to sys/nfs in preparation for them to be used by both NFS
clients. Also, move the declaration of the three global data
structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c
so that they are defined when either client uses them.

Reviewed by: jhb
MFC after: 2 weeks


# 216634 21-Dec-2010 jkim

Improve PCB flags handling and make it more robust. Add two new functions
for manipulating pcb_flags. These inline functions are very similar to
atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK
prefix for SMP. Add comments about the rationale[1]. Use these functions
wherever possible. Although there are some places where it is not strictly
necessary (e.g., a PCB is copied to create a new PCB), it is done across
the board for sake of consistency. Turn pcb_full_iret into a PCB flag as
it is safe now. Move rarely used fields before pcb_flags and reduce size
of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in
the neighborhood.

Reviewed by: kib
Submitted by: kib[1]
MFC after: 2 months


# 216253 07-Dec-2010 kib

Retire write-only PCB_FULLCTX pcb flag on amd64.

Reminded by: Petr Salinger <Petr.Salinger seznam cz>
Tested by: pho
MFC after: 1 week


# 214631 01-Nov-2010 jhb

Move <machine/apicreg.h> to <x86/apicreg.h>.


# 210780 02-Aug-2010 jkim

Rearrange struct pcb. r177532 (CVS r1.64 of pcb.h) moved pcb_flags to make
better use of cache lines by placing it before pcb_save (now pcb_user_save),
which is moved to the end of pcb since r210777.


# 210777 02-Aug-2010 jkim

- Merge savectx2() with savectx() and struct xpcb with struct pcb. [1]
savectx() is only used for panic dump (dumppcb) and kdb (stoppcbs). Thus,
saving additional information does not hurt and it may be even beneficial.
Unfortunately, struct pcb has grown larger to accommodate more data.
Move 512-byte long pcb_user_save to the end of struct pcb while I am here.
- savectx() now saves FPU state unconditionally and copy it to the PCB of
FPU thread if necessary. This gives panic dump and kdb a chance to take
a look at the current FPU state even if the FPU is "supposedly" not used.
- Resuming CPU now unconditionally reinitializes FPU. If the saved FPU
state was irrelevant, it could be in an unknown state.

Suggested by: bde [1]


# 210614 29-Jul-2010 jkim

Rename PCB_USER_FPU to PCB_USERFPU not to clash with a macro from fpu.h.


# 210514 26-Jul-2010 jkim

Re-implement FPU suspend/resume for amd64. This removes superfluous uses
of critical_enter(9) and critical_exit(9) by fpugetregs() and fpusetregs().
Also, we do not touch PCB flags any more.

MFC after: 1 month


# 195486 09-Jul-2009 kib

Restore the segment registers and segment base MSRs for amd64 syscall
return path only when neither thread was context switched while
executing syscall code nor syscall explicitely modified LDT or MSRs.

Save segment registers in trap handlers before interrupts are enabled,
to not allow context switches to happen before registers are saved.
Use separated byte in pcb for indication of fast/full return, since
pcb_flags are not synchronized with context switches.

The change puts back syscall microbenchmark numbers that were slowed
down after commit of the support for LDT on amd64.

Reviewed by: jeff
Tested (and tested, and tested ...) by: pho
Approved by: re (kensmith)


# 195228 01-Jul-2009 dfr

Don't include rpcv2.h - it has been removed.

Submitted by: ed@
Approved by: re


# 190629 01-Apr-2009 jkim

Garbage collect unused MSR_GSBASE since r190620.

The only consumer was exception.S and specialreg.h is directly included now.
Note no md5 changes were observed for all assym.s consumers with this.


# 190626 01-Apr-2009 jkim

Garbage collect unused stack segment since r190620.


# 190620 01-Apr-2009 kib

Save and restore segment registers on amd64 when entering and leaving
the kernel on amd64. Fill and read segment registers for mcontext and
signals. Handle traps caused by restoration of the
invalidated selectors.

Implement user-mode creation and manipulation of the process-specific
LDT descriptors for amd64, see sysarch(2).

Implement support for TSS i/o port access permission bitmap for amd64.

Context-switch LDT and TSS. Do not save and restore segment registers on
the context switch, that is handled by kernel enter/leave trampolines
now. Remove segment restore code from the signal trampolines for
freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason.

Implement amd64-specific compat shims for sysarch.

Linuxolator (temporary ?) switched to use gsbase for thread_area pointer.

TODO:
Currently, gdb is not adapted to show segment registers from struct reg.
Also, no machine-depended ptrace command is added to set segment
registers for debugged process.

In collaboration with: pho
Discussed with: peter
Reviewed by: jhb
Linuxolator tested by: dchagin


# 189903 16-Mar-2009 jkim

Initial suspend/resume support for amd64.

This code is heavily inspired by Takanori Watanabe's experimental SMP patch
for i386 and large portion was shamelessly cut and pasted from Peter Wemm's
AP boot code.


# 185991 12-Dec-2008 jkoshy

Expose symbol `PMC_FN_USER_CALLCHAIN' to assembler code.


# 182868 08-Sep-2008 kib

The pcb_gs32p should be per-cpu, not per-thread pointer. This is
location in GDT where the segment descriptor from pcb_gs32sd is
copied, and the location is in GDT local to CPU.

Noted and reviewed by: peter
MFC after: 1 week


# 180992 30-Jul-2008 kib

Bring back the save/restore of the %ds, %es, %fs and %gs registers for
the 32bit images on amd64.

Change the semantic of the PCB_32BIT pcb flag to request the context
switch code to operate on the segment registers. Its previous meaning
of saving or restoring the %gs base offset is assigned to the new
PCB_GS32BIT flag.

FreeBSD 32bit image activator sets the PCB_32BIT flag, while Linux 32bit
emulation sets PCB_32BIT | PCB_GS32BIT.

Reviewed by: peter
MFC after: 2 weeks


# 179049 16-May-2008 attilio

Removed unused assembly offsets for structures digging.


# 177533 23-Mar-2008 peter

Export TDP_KTHREAD to asm files.


# 173855 23-Nov-2007 jkoshy

MFP4: Add assembly language symbols used by hwpmc(4)'s callchain capture.


# 172212 17-Sep-2007 peter

Fix an undefined symbol that as/ld neglected to flag as a problem. It
was used in assembler code in such a way that no unresolved relocation
records were generated, so ld didn't flag the problem. You can see
this with an 'nm' of the kernel. There will be 'U MAXCPU' on SMP systems.

The impact of this is that the intrcount/intrnames arrays do not have
the intended amount of space reserved. This could lead to interesting
problems due to the arrays being present in the middle of kernel code.
An overflow would be rather interesting as executable code would be used
as per-cpu incrementing interrupt counters.

This fixes it for now by exporting MAXCPU to the assembler. A better fix
might be to define these data structures in C - they're only referenced
in the kernel from C code these days anyway.

Approved by: re (kensmith)


# 172207 17-Sep-2007 jeff

- Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.

Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)


# 170368 06-Jun-2007 davidxu

Backout experimental adaptive-spin umtx code.


# 170309 04-Jun-2007 jeff

- Expose td_lock to assembly so it may be used in cpu_switch().


# 168035 29-Mar-2007 jkim

MFP4: Linux set_thread_area syscall (aka TLS) support for amd64.

Initial version was submitted by Divacky Roman and mostly rewritten by me.

Tested by: emulation


# 165369 20-Dec-2006 davidxu

Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
kern.threads.umtx_dflt_spins
if the sysctl value is non-zero, a zero umutex.m_spincount will
cause the sysctl value to be used a spin cycle count.
kern.threads.umtx_max_spins
the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130


# 164760 30-Nov-2006 jb

Turn console printf buffering into a kernel option and only on
by default for sun4v where it is absolutely required.

This change moves the buffer from struct pcpu to the stack to avoid
using the critical section which created a LOR in a couple of cases
due to interaction with the tty code and kqueue. The LOR can't be
fixed with the critical section and the pcpu buffer can't be used
without the critical section.

Putting the buffer on the stack was my initial solution, but it was
pointed out that the stress on the stack might cause problems
depending on the call path. We don't have a way of creating tests
for those possible cases, so it's best to leave this as an option
for the time being. In time we may get enough data to enable this
option more generally.


# 163858 01-Nov-2006 jb

Add a cnputs() function to write a string to the console with
a lock to prevent interspersed strings written from different CPUs
at the same time.

To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.

String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.

Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.

A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.

Reviewed by: scottl, jhb
MFC after: 2 weeks


# 150647 27-Sep-2005 peter

Kill pcb_rflags. It served no purpose.

Reported by: bde


# 149526 27-Aug-2005 jkoshy

- Special-case NMI handling on the AMD64.

On entry or exit from the kernel the 'alltraps' and 'doreti' code
used taken by normal traps disables interrupts to protect the
critical sections where it is setting up %gs.

This protection is insufficient in the presence of NMIs since NMIs
can be taken even when the processor has disabled normal interrupts.
Thus the NMI handler needs to actually read MSR_GBASE on entry to
the kernel to determine whether a swap of %gs using 'swapgs' is
needed. However, reads of MSRs are expensive and integrating this
check into the 'alltraps'/'doreti' path would penalize normal
interrupts.

- Teach DDB about the 'nmi_calltrap' symbol.

Reviewed by: bde, peter (older versions of this change)


# 137917 20-Nov-2004 das

Remove references to U area and garbage collect includes.

Reviewed by: arch@


# 129309 16-May-2004 peter

Checkpoint some of what I was starting to tinker with for having some
different context support for 32 vs 64 bit processes. This simply omits
the save/restore of the segment selector registers for non 32 bit
processes. This avoids the rdmsr/rwmsr juggling when restoring %gs
clobbers the kernel msr that holds the gsbase.

However, I suspect it might be better to conditionally do this at
user<->kernel transition where we wouldn't need to do the juggling in the
first place. Or have per-thread extended context save/restore hooks.


# 127914 05-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 125178 28-Jan-2004 peter

Export PCB_DR* symbols


# 122940 21-Nov-2003 peter

Cosmetic and/or trivial sync up with i386.

Approved by: re (rwatson)


# 122849 17-Nov-2003 peter

Initial landing of SMP support for FreeBSD/amd64.

- This is heavily derived from John Baldwin's apic/pci cleanup on i386.
- I have completely rewritten or drastically cleaned up some other parts.
(in particular, bootstrap)
- This is still a WIP. It seems that there are some highly bogus bioses
on nVidia nForce3-150 boards. I can't stress how broken these boards
are. I have a workaround in mind, but right now the Asus SK8N is broken.
The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed.
- Most of my testing has been with SCHED_ULE. SCHED_4BSD works.
- the apic and acpi components are 'standard'.
- If you have an nVidia nForce3-150 board, you are stuck with 'device
atpic' in addition, because they somehow managed to forget to connect the
8254 timer to the apic, even though its in the same silicon! ARGH!
This directly violates the ACPI spec.


# 120595 30-Sep-2003 jeff

- Remove the definition for TD_SWITCHIN as it is not used.

Approved by: peter


# 118031 25-Jul-2003 obrien

Use __FBSDID().

Brought to you by: a boring talk at Ottawa Linux Symposium


# 115251 23-May-2003 peter

Major pmap rework to take advantage of the larger address space on amd64
systems. Of note:
- Implement a direct mapped region using 2MB pages. This eliminates the
need for temporary mappings when getting ptes. This supports up to
512GB of physical memory for now. This should be enough for a while.
- Implement a 4-tier page table system. Most of the infrastructure is
there for 128TB of userland virtual address space, but only 512GB is
presently enabled due to a mystery bug somewhere. The design of this
was heavily inspired by the alpha pmap.c.
- The kernel is moved into the negative address space(!).
- The kernel has 2GB of KVM available.
- Provide a uma memory allocator to use the direct map region to take
advantage of the 2MB TLBs.
- Fixed some assumptions in the bus_space macros about the ability
to fit virtual addresses in an 'int'.

Notable missing things:
- pmap_growkernel() should be able to grow to 512GB of KVM by expanding
downwards below kernbase. The kernel must be at the top 2GB of the
negative address space because of gcc code generation strategies.
- need to fix the >512GB user vm code.

Approved by: re (blanket)


# 115006 14-May-2003 peter

Collect the nastiness for preserving the kernel MSR_GSBASE around the
load_gs() calls into a single place that is less likely to go wrong.

Eliminate the per-process context switching of MSR_GSBASE, because it
should be constant for a single cpu. Instead, save/restore it during
the loading of the new %gs selector for the new process.

Approved by: re (amd64/* blanket)


# 114987 14-May-2003 peter

Add BASIC i386 binary support for the amd64 kernel. This is largely
stolen from the ia64/ia32 code (indeed there was a repocopy), but I've
redone the MD parts and added and fixed a few essential syscalls. It
is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic)
and p4. The ia64 code has not implemented signal delivery, so I had
to do that.

Before you say it, yes, this does need to go in a common place. But
we're in a freeze at the moment and I didn't want to risk breaking ia64.
I will sort this out after the freeze so that the common code is in a
common place.

On the AMD64 side, this required adding segment selector context switch
support and some other support infrastructure. The %fs/%gs etc code
is hairy because loading %gs will clobber the kernel's current MSR_GSBASE
setting. The segment selectors are not used by the kernel, so they're only
changed at context switch time or when changing modes. This still needs
to be optimized.

Approved by: re (amd64/* blanket)


# 114952 12-May-2003 peter

For the page fault handler, save %cr2 in the outer trap handler so that
we do not have to run so long with interrupts disabled. This involved
creating tf_addr in the trapframe. Reorganize the trap stubs so that
they consistently reserve the stack space and initialize any missing
bits.

Approved by: re (amd64 stuff)


# 114928 12-May-2003 peter

Give a %fs and %gs to userland. Use swapgs to obtain the kernel %GS.base
value on entry and exit. This isn't as easy as it sounds because when
we recursively trap or interrupt, we have to avoid duplicating the
swapgs instruction or we end up back with the userland %gs. I implemented
this by testing TF_CS to see if we're coming from supervisor mode
already, and check for returning to supervisor. To avoid a race with
interrupts in the brief period after beginning executing the handler and
before the swapgs, convert all trap gates to interrupt gates, and reenable
interrupts immediately after the swapgs. I am not happy with this.
There are other possible ways to do this that should be investigated.
(eg: storing the GS.base MSR value in the trapframe)

Add some sysarch functions to let the userland code get to this.

Approved by: re (blanket amd64/*)


# 114918 11-May-2003 peter

Export PML4SHIFT and PDPSHIFT

Approved by: re (blanket amd64/*)


# 114349 30-Apr-2003 peter

Commit MD parts of a loosely functional AMD64 port. This is based on
a heavily stripped down FreeBSD/i386 (brutally stripped down actually) to
attempt to get a stable base to start from. There is a lot missing still.
Worth noting:
- The kernel runs at 1GB in order to cheat with the pmap code. pmap uses
a variation of the PAE code in order to avoid having to worry about 4
levels of page tables yet.
- It boots in 64 bit "long mode" with a tiny trampoline embedded in the
i386 loader. This simplifies locore.s greatly.
- There are still quite a few fragments of i386-specific code that have
not been translated yet, and some that I cheated and wrote dumb C
versions of (bcopy etc).
- It has both int 0x80 for syscalls (but using registers for argument
passing, as is native on the amd64 ABI), and the 'syscall' instruction
for syscalls. int 0x80 preserves all registers, 'syscall' does not.
- I have tried to minimize looking at the NetBSD code, except in a couple
of places (eg: to find which register they use to replace the trashed
%rcx register in the syscall instruction). As a result, there is not a
lot of similarity. I did look at NetBSD a few times while debugging to
get some ideas about what I might have done wrong in my first attempt.


# 113621 17-Apr-2003 jhb

Remove a couple of unused symbols.


# 113339 10-Apr-2003 julian

Move the _oncpu entry from the KSE to the thread.
The entry in the KSE still exists but it's purpose will change a bit
when we add the ability to lock a KSE to a cpu.


# 111372 23-Feb-2003 jake

Previous commit missed a 1 that should be NGPTD, and an NPDEPG that should
be NPDEPTD. Grumble.

Sponsored by: DARPA, Network Associates Laboratories


# 111363 23-Feb-2003 jake

- Added macros NPGPTD, NBPTD, and NPDEPTD, for dealing with the size of the
page directory.
- Use these instead of the magic constants 1 or PAGE_SIZE where appropriate.
There are still numerous assumptions that the page directory is exactly
1 page.

Sponsored by: DARPA, Network Associates Laboratories


# 111299 23-Feb-2003 jake

- Added macros PDESHIFT and PTESHIFT, use these instead of magic constants
in locore.
- Removed the macros PTESIZE and PDESIZE, use sizeof instead in C.

Sponsored by: DARPA, Network Associates Laboratories


# 111032 17-Feb-2003 julian

Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by: parts by davidxu@
Reviewed by: jeff@ mini@


# 110190 01-Feb-2003 julian

Reversion of commit by Davidxu plus fixes since applied.

I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.


# 109877 26-Jan-2003 davidxu

Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian


# 107521 02-Dec-2002 deischen

Align the FPU state in the ucontext and sigcontext to 16 bytes
to accomodate the new SSE/XMM floating point save/restore
instructions.

This commit is mostly from bde and includes some style nits.

Approved by: re (jhb)


# 106542 06-Nov-2002 davidxu

1.Fix smp race between kernel vm86 BIOS calling and userland vm86 mode code,
remove global variable in_vm86call, set vm86 calling flag in PCB flags.

2.Fix vm86 BIOS calling preempted problem by changing vm86_lock mutex type
from MTX_DEF to MTX_SPIN. vm86pcb is not remembered in thread struct,
when the thread calling vm86 BIOS is preempted by interrupt thread,
and later switching back to the thread would cause incorrect context be
loaded into CPU registers, this leads to kernel crash.


# 105950 25-Oct-2002 peter

Split 4.x and 5.x signal handling so that we can keep 4.x signal
handling clean and functional as 5.x evolves. This allows some of the
nasty bandaids in the 5.x codepaths to be unwound.

Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an
anti-foot-shooting measure in place, 5.x folks need this for a while) and
finish encapsulating the older stuff under COMPAT_43. Since the ancient
stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *'
to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn
is supposed to take), add a compile time check to prevent foot shooting
there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.

Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago).
Approved by: re


# 103407 16-Sep-2002 mini

Add kernel support needed for the KSE-aware libpthread:
- Maintain fpu state across signals.
- Use ucontext_t's to store KSE thread state.
- Synthesize state for the UTS upon each upcall, rather than
saving and copying a trapframe.
- Save and restore FPU state properly in ucontext_t's.

Reviewed by: deischen, julian
Approved by: -arch


# 99890 12-Jul-2002 dillon

Re-enable the idle page-zeroing code. Remove all IPIs from the idle
page-zeroing code as well as from the general page-zeroing code and use a
lazy tlb page invalidation scheme based on a callback made at the end
of mi_switch.

A number of people came up with this idea at the same time so credit
belongs to Peter, John, and Jake as well.

Two-way SMP buildworld -j 5 tests (second run, after stabilization)
2282.76 real 2515.17 user 704.22 sys before peter's IPI commit
2266.69 real 2467.50 user 633.77 sys after peter's commit
2232.80 real 2468.99 user 615.89 sys after this commit

Reviewed by: peter, jhb
Approved by: peter


# 99887 12-Jul-2002 jhb

Set the thread state of the newly chosen to run thread to TDS_RUNNING in
choosethread() in MI C code instead of doing it in in assembly in all the
various cpu_switch() functions. This fixes problems on ia64 and sparc64.

Reviewed by: julian, peter, benno
Tested on: i386, alpha, sparc64


# 99742 10-Jul-2002 dillon

Remove the critmode sysctl - the new method for critical_enter/exit (already
the default) is now the only method for i386.

Remove the paraphanalia that supported critmode. Remove td_critnest, clean
up the assembly, and clean up (mostly remove) the old junk from
cpu_critical_enter() and cpu_critical_exit().


# 99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


# 93264 27-Mar-2002 dillon

Compromise for critical*()/cpu_critical*() recommit. Cleanup the interrupt
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).

Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.

This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.

Reviewed by: core
Approved by: core


# 91328 26-Feb-2002 dillon

revert last commit temporarily due to whining on the lists.


# 91315 26-Feb-2002 dillon

STAGE-1 of 3 commit - allow (but do not require) interrupts to remain
enabled in critical sections and streamline critical_enter() and
critical_exit().

This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes. Architectures that do not wish to do
this are not effected by this change.

This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.

This commit is just the first stage. Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.

The following changes have been made:

* critical_enter() and critical_exit() for I386 now simply increment
and decrement curthread->td_critnest. They no longer disable
hard interrupts. When critical_exit() decrements the counter to
0 it effectively calls a routine to deal with whatever interrupts
were deferred during the time the code was operating in a critical
section.

Other architectures are unaffected.

* fork_exit() has been conditionalized to remove MD assumptions for
the new code. Old code will still use the old MD assumptions
in regards to hard interrupt disablement. In STAGE-2 this will
be turned into a subroutine call into MD code rather then hardcoded
in MI code.

The new code places the burden of entering the critical section
in the trampoline code where it belongs.

* I386: interrupts are now enabled while we are in a critical section.
The interrupt vector code has been adjusted to deal with the fact.
If it detects that we are in a critical section it currently defers
the interrupt by adding the appropriate bit to an interrupt mask.

* In order to accomplish the deferral, icu_lock is required. This
is i386-specific. Thus icu_lock can only be obtained by mainline
i386 code while interrupts are hard disabled. This change has been
made.

* Because interrupts may or may not be hard disabled during a
context switch, cpu_switch() can no longer simply assume that
PSL_I will be in a consistent state. Therefore, it now saves and
restores eflags.

* FAST INTERRUPT PROVISION. Fast interrupts are currently deferred.
The intention is to eventually allow them to operate either while
we are in a critical section or, if we are able to restrict the
use of sched_lock, while we are not holding the sched_lock.

* ICU and APIC vector assembly for I386 cleaned up. The ICU code
has been cleaned up to match the APIC code in regards to format
and macro availability. Additionally, the code has been adjusted
to deal with deferred interrupts.

* Deferred interrupts use a per-cpu boolean int_pending, and
masks ipending, spending, and fpending. Being per-cpu variables
it is not currently necessary to lock; bus cycles modifying them.

Note that the same mechanism will enable preemption to be
incorporated as a true software interrupt without having to
further hack up the critical nesting code.

* Note: the old critical_enter() code in kern/kern_switch.c is
currently #ifdef to be compatible with both the old and new
methodology. In STAGE-2 it will be moved entirely to MD code.

Performance issues:

One of the purposes of this commit is to enhance critical section
performance, specifically to greatly reduce bus overhead to allow
the critical section code to be used to protect per-cpu caches.
These caches, such as Jeff's slab allocator work, can potentially
operate very quickly making the effective savings of the new
critical section code's performance very significant.

The second purpose of this commit is to allow architectures to
enable certain interrupts while in a critical section. Specifically,
the intention is to eventually allow certain FAST interrupts to
operate rather then defer.

The third purpose of this commit is to begin to clean up the
critical_enter()/critical_exit()/cpu_critical_enter()/
cpu_critical_exit() API which currently has serious cross pollution
in MI code (in fork_exit() and ast() for example).

The fourth purpose of this commit is to provide a framework that
allows kernel-preempting software interrupts to be implemented
cleanly. This is currently used for two forward interrupts in I386.
Other architectures will have the choice of using this infrastructure
or building the functionality directly into critical_enter()/
critical_exit().

Finally, this commit is designed to greatly improve the flexibility
of various architectures to manage critical section handling,
software interrupts, preemption, and other highly integrated
architecture-specific details.


# 90776 17-Feb-2002 deischen

Use struct __ucontext in prototypes and associated functions instead of
ucontext_t. Forward declare struct __ucontext in <sys/signal.h> and
remove reliance on <sys/ucontext.h> being included.

While I'm here, also hide osigcontext types from userland; suggested
by bde.

Namespace pollution noticed by: Kevin Day <toasty@shell.dragondata.com>


# 90344 07-Feb-2002 phk

GC the PC_SWITCH* symbols which are not used in assembly anymore.


# 88088 17-Dec-2001 jhb

Modify the critical section API as follows:
- The MD functions critical_enter/exit are renamed to start with a cpu_
prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
count and a per-thread critical section saved state set when entering
a critical section while at nesting level 0 and restored when exiting
to nesting level 0. This moves the saved state out of spin mutexes so
that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
cpu_critical_enter/exit. MI code such as device drivers and spin
mutexes use the MI wrappers. Note that since the MI wrappers store
the state in the current thread, they do not have any return values or
arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
assigned to curthread->td_savecrit during fork_exit().

Tested on: i386, alpha


# 87702 11-Dec-2001 jhb

Overhaul the per-CPU support a bit:

- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.

Tested on: alpha, i386
Reviewed by: peter, jake


# 85449 24-Oct-2001 jhb

Split the per-process Local Descriptor Table out of the PCB and into
struct mdproc.

Submitted by: Andrew R. Reiter <arr@watson.org>
Silence on: -current


# 84615 07-Oct-2001 nyan

Rewrite the pc98 bus_space stuff.

The type of bus_space_tag_t is now a pointer to bus_space_tag structure,
and the bus_space_tag structure saves pointers to functions for direct
access and relocate access.

Added bsh_bam member to the bus_space_handle structure, it saves access
method either direct access or relocate access which is called by
bus_space_* functions.

Added the mecia device support. If the bs_da and bs_ra in bus tag are set
NEPC_io_space_tag and NEPC_mem_space_tag respectively, new bus_space stuff
changes the register of mecia automatically for 16bit access.

Obtained from: NetBSD/pc98


# 83651 18-Sep-2001 peter

Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 82309 25-Aug-2001 peter

Optionize UPAGES for the i386. As part of this I split some of the low
level implementation stuff out of machine/globaldata.h to avoid exposing
UPAGES to lots more places. The end result is that we can double
the kernel stack size with 'options UPAGES=4' etc.

This is mainly being done for the benefit of a MFC to RELENG_4 at some
point. -current doesn't really need this so much since each interrupt
runs on its own kstack.


# 79609 12-Jul-2001 peter

Activate SSE/SIMD. This is the extra context switching support that
we are required to do if we let user processes use the extra 128 bit
registers etc.

This is the base part of the diff I got from:
http://www.issei.org/issei/FreeBSD/sse.html
I believe this is by: Mr. SUZUKI Issei <issei@issei.org>
SMP support apparently by: Takekazu KATO <kato@chino.it.okayama-u.ac.jp>
Test code by: NAKAMURA Kazushi <kaz@kobe1995.net>, see
http://kobe1995.net/~kaz/FreeBSD/SSE.en.html

I have fixed a couple of style(9) deviations. I have some followup
commits to fix a couple of non-style things.


# 76117 29-Apr-2001 grog

Revert consequences of changes to mount.h, part 2.

Requested by: bde


# 75858 23-Apr-2001 grog

Correct #includes to work with fixed sys/mount.h.


# 75256 06-Apr-2001 jhb

Axe the per-cpu variable witness_spin_check as it was replaced by the
per-cpu spinlocks list.


# 74902 28-Mar-2001 jhb

Catch up to the mtx_saveintr -> mtx_savecrit change.


# 72930 22-Feb-2001 peter

Activate USER_LDT by default. The new thread libraries are going to
depend on this. The linux ABI emulator tries to use it for some linux
binaries too. VM86 had a bigger cost than this and it was made default
a while ago.

Reviewed by: jhb, imp


# 72746 20-Feb-2001 jhb

- Don't call clear_resched() in userret(), instead, clear the resched flag
in mi_switch() just before calling cpu_switch() so that the first switch
after a resched request will satisfy the request.
- While I'm at it, move a few things into mi_switch() and out of
cpu_switch(), specifically set the p_oncpu and p_lastcpu members of
proc in mi_switch(), and handle the sched_lock state change across a
context switch in mi_switch().
- Since cpu_switch() no longer handles the sched_lock state change, we
have to setup an initial state for sched_lock in fork_exit() before we
release it.


# 72376 11-Feb-2001 jake

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


# 72276 10-Feb-2001 jhb

- Make astpending and need_resched process attributes rather than CPU
attributes. This is needed for AST's to be properly posted in a preemptive
kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and
PS_NEEDRESCHED. They are still accesssed by their old macros:
aston(), astoff(), etc. For completeness, an astpending() macro has been
added to check for a pending AST, and clear_resched() has been added to
clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
other architectures.


# 71817 30-Jan-2001 peter

Remove unused GD_CPU_LOCKID, GD_OTHER_CPUS, PS_IDLESTACK and
PS_IDLESTACK_TOP


# 71337 21-Jan-2001 jake

Make intr_nesting_level per-process, rather than per-cpu. Setup
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context. This is also necessary
in order to context switch from sched_ithd() directly.

Reviewed By: peter


# 71318 21-Jan-2001 jake

Remove the per-cpu pages used for copy and zero-ing pages of memory
for SMP; just use the same ones as UP. These weren't used without
holding Giant anyway, and the routines that use them would have to
be protected from pre-emption to avoid migrating cpus.


# 71294 20-Jan-2001 jake

Rename the ASSYM MTX_RECURSE to MTX_RECURSECNT in order to not conflict
with the flag of the same name.


# 71292 20-Jan-2001 jake

Simplify the i386 asm MTX_{ENTER,EXIT} macros to just call the
appropriate function, rather than doing a horse-and-buggy
acquire. They now take the mutex type as an arg and can be
used with sleep as well as spin mutexes.


# 70952 12-Jan-2001 jake

Remove unused per-cpu variables inside_intr and ss_eflags.


# 70714 06-Jan-2001 jake

Use %fs to access per-cpu variables in uni-processor kernels the same
as multi-processor kernels. The old way made it difficult for kernel
modules to be portable between uni-processor and multi-processor
kernels. It is no longer necessary to jump through hoops.

- always load %fs with the private segment on entry to the kernel
- change the type of the self referntial pointer from struct privatespace
to struct globaldata
- make the globaldata symbol have value 0 in all cases, so the symbols
in globals.s are always offsets, not aliases for fields in globaldata
- define the globaldata space used for uniprocessor kernels in C, rather
than assembler
- change the assmebly language accessors to use %fs, add a macro
PCPU_ADDR(member, reg), which loads the register reg with the address
of the per-cpu variable member


# 70006 14-Dec-2000 jake

Use _lapic+offset to access the local apic from assembly language
files, rather than the symbols in globals.s. The offsets are
generated by genassym.


# 69919 12-Dec-2000 jhb

Add in symbols needed in the WITNESS_ENTER and WITNESS_EXIT macros in
i386/include/mutex.h.


# 67899 29-Oct-2000 phk

Remove unneeded <stddef.h> #includes.


# 67358 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- Catch up to the MI mutex structure due to saveflags,saveipl,savepsr
becoming saveintr.


# 67011 12-Oct-2000 bde

Moved the definitions of AST_PENDING and AST_RESCHED to the correct place.


# 66716 06-Oct-2000 jhb

- Change fast interrupts on x86 to push a full interrupt frame and to
return through doreti to handle ast's. This is necessary for the
clock interrupts to work properly.
- Change the clock interrupts on the x86 to be fast instead of threaded.
This is needed because both hardclock() and statclock() need to run in
the context of the current process, not in a separate thread context.
- Kill the prevproc hack as it is no longer needed.
- We really need Giant when we call psignal(), but we don't want to block
during the clock interrupt. Instead, use two p_flag's in the proc struct
to mark the current process as having a pending SIGVTALRM or a SIGPROF
and let them be delivered during ast() when hardclock() has finished
running.
- Remove CLKF_BASEPRI, which was #ifdef'd out on the x86 anyways. It was
broken on the x86 if it was turned on since cpl is gone. It's only use
was to bogusly run softclock() directly during hardclock() rather than
scheduling an SWI.
- Remove the COM_LOCK simplelock and replace it with a clock_lock spin
mutex. Since the spin mutex already handles disabling/restoring
interrupts appropriately, this also lets us axe all the *_intr() fu.
- Back out the hacks in the APIC_IO x86 cpu_initclocks() code to use
temporary fast interrupts for the APIC trial.
- Add two new process flags P_ALRMPEND and P_PROFPEND to mark the pending
signals in hardclock() that are to be delivered in ast().

Submitted by: jakeb (making statclock safe in a fast interrupt)
Submitted by: cp (concept of delaying signals until ast())


# 65557 06-Sep-2000 jasone

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


# 60041 05-May-2000 phk

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


# 58717 28-Mar-2000 dillon

Commit major SMP cleanups and move the BGL (big giant lock) in the
syscall path inward. A system call may select whether it needs the MP
lock or not (the default being that it does need it).

A great deal of conditional SMP code for various deadended experiments
has been removed. 'cil' and 'cml' have been removed entirely, and the
locking around the cpl has been removed. The conditional
separately-locked fast-interrupt code has been removed, meaning that
interrupts must hold the CPL now (but they pretty much had to anyway).
Another reason for doing this is that the original separate-lock for
interrupts just doesn't apply to the interrupt thread mechanism being
contemplated.

Modifications to the cpl may now ONLY occur while holding the MP
lock. For example, if an otherwise MP safe syscall needs to mess with
the cpl, it must hold the MP lock for the duration and must (as usual)
save/restore the cpl in a nested fashion.

This is precursor work for the real meat coming later: avoiding having
to hold the MP lock for common syscalls and I/O's and interrupt threads.
It is expected that the spl mechanisms and new interrupt threading
mechanisms will be able to run in tandem, allowing a slow piecemeal
transition to occur.

This patch should result in a moderate performance improvement due to
the considerable amount of code that has been removed from the critical
path, especially the simplification of the spl*() calls. The real
performance gains will come later.

Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch


# 58345 20-Mar-2000 phk

Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users: Greg has not had time to test this yet, be careful.


# 55604 08-Jan-2000 bde

Compile genassym.c with ordinary ${CFLAGS}. The (small) needs for
${GEN_CFLAGS} and -U_KERNEL became negative when all all the
genassym.c's were converted to be cross-built.

Makefile.*:
- Cleanups associated with the old genassym.
- Fixed deprecated spelling of ${.IMPSRC} as "$<".


# 55545 07-Jan-2000 marcel

Use genassym(1). The definitions of NKPDE and NKPT have been removed
because they are already defined in pmap.h, resulting in duplicate
definitions.

Reviewed by: bde


# 55205 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# 54188 06-Dec-1999 luoqi

User ldt sharing.


# 52140 11-Oct-1999 luoqi

Add a per-signal flag to mark handlers registered with osigaction, so we
can provide the correct context to each signal handler.

Fix broken sigsuspend(): don't use p_oldsigmask as a flag, use SAS_OLDMASK
as we did before the linuxthreads support merge (submitted by bde).

Move ps_sigstk from to p_sigacts to the main proc structure since signal
stack should not be shared among threads.

Move SAS_OLDMASK and SAS_ALTSTACK flags from sigacts::ps_flags to proc::p_flag.
Move PS_NOCLDSTOP and PS_NOCLDWAIT flags from proc::p_flag to procsig::ps_flag.

Reviewed by: marcel, jdp, bde


# 51984 07-Oct-1999 marcel

Simplification of the signal trampoline and other cleanups.

o Remove unused defines from genassym.c that were needed
by the trampoline.
o Add load_gs_param function to support.s that catches
a fault when %gs is loaded with an invalid descriptor.
The function returns EFAULT in that case.
o Remove struct trapframe from mcontext_t and replace it
with the list of registers.
o Modify sendsig and sigreturn accordingly.

This commit contains a patch by bde.

Reviewed by: luoqi, jdp


# 51942 04-Oct-1999 marcel

Re-introduction of sigcontext.

struct sigcontext and ucontext_t/mcontext_t are defined in such
a way that both (ie struct sigcontext and ucontext_t) can be
passed on to sigreturn. The signal handler is still given a
ucontext_t for maximum flexibility.

For backward compatibility sigreturn restores the state for the
alternate signal stack from sigcontext.sc_onstack and not from
ucontext_t.uc_stack. A good way to determine which value the
application has set and thus which value to use, is still open
for discussion.

NOTE: This change should only affect those binaries that use
sigcontext and/or ucontext_t. In the source tree itself
this is only doscmd. Recompilation is required for those
applications.

This commit also fixes a lot of style bugs without hopefully
adding new ones.

NOTE: struct sigaltstack.ss_size now has type size_t again. For
some reason I changed that into unsigned int.

Parts submitted by: bde
sigaltstack bug found by: bde


# 51792 29-Sep-1999 marcel

sigset_t change (part 3 of 5)
-----------------------------

By introducing a new sigframe so that the signal handler operates
on the new siginfo_t and on ucontext_t instead of sigcontext, we
now need two version of sendsig and sigreturn.

A flag in struct proc determines whether the process expects an
old sigframe or a new sigframe. The signal trampoline handles
which sigreturn to call. It does this by testing for a magic
cookie in the frame.

The alpha uses osigreturn to implement longjmp. This means that
osigreturn is not only used for compatibility with existing
binaries. To handle the new sigset_t, setjmp saves it in
sc_reserved (see NOTE).

the struct sigframe has been moved from frame.h to sigframe.h
to handle the complex header dependencies that was caused by
the new sigframe.

NOTE: For the i386, the size of jmp_buf has been increased to hold
the new sigset_t. On the alpha this has been prevented by
using sc_reserved in sigcontext.


# 51065 07-Sep-1999 luoqi

Save %gs in sigcontext when delivering a signal and restore them upon
return (in signal trampoline code). I plan to do the same on -stable,
so that we have a consistent interface to userland applications.

Reviewed by: bde


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 50037 18-Aug-1999 peter

Update for MI switch code, and trim a heap of unused (I believe) entries.


# 49207 29-Jul-1999 peter

GBIOSSTACK_SEL is undefined, but OTOH, BSSSEL apparently isn't used either.


# 49204 29-Jul-1999 msmith

Remove some duplicate definitions, as suggested by Alan Cox.


# 48691 09-Jul-1999 jlemon

Implement support for hardware debug registers on the i386.

Submitted by: Brian Dean <brdean@unx.sas.com>


# 48621 06-Jul-1999 cracauer

Implement SA_SIGINFO for i386. Thanks to Bruce Evans for much more
than a review, this was a nice puzzle.

This is supposed to be binary and source compatible with older
applications that access the old FreeBSD-style three arguments to a
signal handler.

Except those applications that access hidden signal handler arguments
bejond the documented third one. If you have applications that do,
please let me know so that we take the opportunity to provide the
functionality they need in a documented manner.

Also except application that use 'struct sigframe' directly. You need
to recompile gdb and doscmd. `make world` is recommended.

Example program that demonstrates how SA_SIGINFO and old-style FreeBSD
handlers (with their three args) may be used in the same process is at
http://www3.cons.org/tmp/fbsd-siginfo.c

Programs that use the old FreeBSD-style three arguments are easy to
change to SA_SIGINFO (although they don't need to, since the old style
will still work):

Old args to signal handler:
void handler_sn(int sig, int code, struct sigcontext *scp)

New args:
void handler_si(int sig, siginfo_t *si, void *third)
where:
old:code == new:second->si_code
old:scp == &(new:si->si_scp) /* Passed by value! */

The latter is also pointed to by new:third, but accessing via
si->si_scp is preferred because it is type-save.

FreeBSD implementation notes:
- This is just the framework to make the interface POSIX compatible.
For now, no additional functionality is provided. This is supposed
to happen now, starting with floating point values.
- We don't use 'sigcontext_t.si_value' for now (POSIX meant it for
realtime-related values).
- Documentation will be updated when new functionality is added and
the exact arguments passed are determined. The comments in
sys/signal.h are meant to be useful.

Reviewed by: BDE


# 48308 28-Jun-1999 peter

Use the same -UKERNEL strategy as the alpha to avoid the inlines etc.


# 47678 01-Jun-1999 jlemon

Unifdef VM86.

Reviewed by: silence on on -current


# 47081 12-May-1999 luoqi

Unbreak VESA on SMP.


# 47080 12-May-1999 luoqi

VM86_FRAMESIZE is now the size of vm86 frame, not the number of 4-byte words.

Requested by: Bruce


# 47022 11-May-1999 luoqi

Do not hardcode size of struct vm86frame.

Submitted by: Jonathan Lemon <jlemon@americantv.com>


# 46129 27-Apr-1999 luoqi

Enable vmspace sharing on SMP. Major changes are,
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
is added to point to per-cpu pages, per-cpu global variables are now
accessed through this new selector (%fs). The selectors in gdt table are
rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.

Reviewed by: Alan Cox <alc@cs.rice.edu>
John Dyson <dyson@iquest.net>
Julian Elischer <julian@whistel.com>
Bruce Evans <bde@zeta.org.au>
David Greenman <dg@root.com>


# 45252 02-Apr-1999 alc

Put in place the infrastructure for improved UP and SMP TLB management.

In particular, replace the unused field pmap::pm_flag by pmap::pm_active,
which is a bit mask representing which processors have the pmap activated.
(Thus, it is a simple Boolean on UPs.)

Also, eliminate an unnecessary memory reference from cpu_switch()
in swtch.s.

Assisted by: John S. Dyson <dyson@iquest.net>
Tested by: Luoqi Chen <luoqi@watermarkgroup.com>,
Poul-Henning Kamp <phk@critter.freebsd.dk>


# 44327 28-Feb-1999 bde

Removed all traces of `p_switchtime'. The relevant timestamp is per-cpu,
not per-process. Keep it in `switchtime' consistently.

It is now clear that the timestamp is always valid in fork_trampoline()
except when the child is running on a previously idle cpu, which
can only happen if there are multiple cpus, so don't check or set
the timestamp in fork_trampoline except in the (i386) SMP case.
Just remove the alpha code for setting it unconditionally, since
there is no SMP case for alpha and the code had rotted.

Parts reviewed by: dfr, phk


# 44215 22-Feb-1999 bde

Added a per-cpu variable `switchticks' for use in scheduling.


# 40081 08-Oct-1998 msmith

Fix up the kernel environment and module data pointers in the bootinfo if
they are present.
If we are told where the end of the loaded kernel image is, believe it.


# 39613 24-Sep-1998 bde

Don't redefine kernel. Makefile.i386 now defines it.
Removed some unused includes.


# 38422 18-Aug-1998 msmith

Presently there is only one `currentldt' variable for all cpus
in a SMP system. Unexpected things could happen if each cpu
has a different ldt setting and one cpu tries to use value
of currentldt set by another cpu.

The fix is to move currentldt to the per-cpu area. It includes
patches I filed in PR i386/6219 which are also user ldt related.

PR: i386/7591, i386/6219
Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>


# 37564 11-Jul-1998 bde

Fixed printf format errors.

Use offsetof() instead of null pointer hacks.


# 36441 28-May-1998 phk

Some cleanups related to timecounters and weird ifdefs in <sys/time.h>.

Clean up (or if antipodic: down) some of the msgbuf stuff.

Use an inline function rather than a macro for timecounter delta.

Maintain process "on-cpu" time as 64 bits of microseconds to avoid
needless second rollover overhead.

Avoid calling microuptime the second time in mi_switch() if we do
not pass through _idle in cpu_switch()

This should reduce our context-switch overhead a bit, in particular
on pre-P5 and SMP systems.

WARNING: Programs which muck about with struct proc in userland
will have to be fixed.

Reviewed, but found imperfect by: bde


# 36138 17-May-1998 tegge

Change simple lock handling to not depend upon having a local apic
available. The per-cpu variable ss_tpr has been replaced by ss_eflags.
This reduced the number of interrupts sent to the wrong CPU, due to
the cpu having the global lock being inside a critical region.

Remove some unneeded manipulation of tpr register in mplock.s.

Adjust code in mplock.s to be aware of variables on the stack being
destroyed by MPgetlock if GRAB_LOPRIO is defined.


# 36135 17-May-1998 tegge

Add forwarding of roundrobin to other cpus. This gives a more regular
update of cpu usage as shown by top when one process is cpu bound
(no system calls) while the system is otherwise idle (except for top).

Don't attempt to switch to the BSP in boot(). If the system was idle when
an interrupt caused a panic, this won't work. Instead, switch to the BSP
in cpu_reset.

Remove some spurious forward_statclock/forward_hardclock warnings.


# 36125 17-May-1998 tegge

For SMP, use prv_PPAGE1/prv_PMAP1 instead of PADDR1/PMAP1.
get_ptbase and pmap_pte_quick no longer generates IPIs.
This should reduce the number of IPIs during heavy paging.


# 35087 06-Apr-1998 peter

Fix VM86 compiles. a #include "opt_vm86.h" was missing, and the my_tr
variable was needed in the non-SMP case.

Submitted by: Jonathan Lemon <jlemon@americantv.com>


# 35071 06-Apr-1998 peter

Generate #defines that the asm code can access for the per-cpu data
structures.


# 35029 04-Apr-1998 phk

Time changes mark 2:

* Figure out UTC relative to boottime. Four new functions provide
time relative to boottime.

* move "runtime" into struct proc. This helps fix the calcru()
problem in SMP.

* kill mono_time.

* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)

* nanosleep, select & poll takes long sleeps one day at a time

Reviewed by: bde
Tested by: ache and others


# 32992 01-Feb-1998 bde

Declare printf() instead of including <stdio.h>, so that this doesn't
depend on anything outside of "sys".

Removed an unused include.

Don't use `extern' in a function declaration.


# 30273 10-Oct-1997 peter

GPROC0_SEL isn't used in any *.s files it seems..


# 30265 10-Oct-1997 peter

Convert the VM86 option from a global option to an option only depended
on by the files that use it. Changing the VM86 option now only causes
a recompile of a dozen files or so rather than the entire kernel.


# 27993 08-Aug-1997 dyson

VM86 kernel support.
Work done by BSDI, Jonathan Lemon <jlemon@americantv.com>,
Mike Smith <msmith@gsoft.com.au>, Sean Eric Fagan <sef@kithrup.com>,
and probably alot of others.
Submitted by: Jnathan Lemon <jlemon@americantv.com>


# 26494 07-Jun-1997 bde

Preserve %fs and %gs across context switches. This has a relatively low
cost since it is only done in cpu_switch(), not for every exception.
The extra state is kept in the pcb, and handled much like the npx state,
with similar deficiencies (the state is not preserved across signal
handlers, and error handling loses state).


# 25648 10-May-1997 bde

Cleaned up #includes. Lite2 cleaned up <sys/mount.h> so no kludges
are required for NFS now.

Ifdefed SMP #defines.


# 25164 26-Apr-1997 peter

Man the liferafts! Here comes the long awaited SMP -> -current merge!

There are various options documented in i386/conf/LINT, there is more to
come over the next few days.

The kernel should run pretty much "as before" without the options to
activate SMP mode.

There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.

This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!


# 24691 07-Apr-1997 peter

The biggie: Get rid of the UPAGES from the top of the per-process address
space. (!)

Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.

Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.

The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).

Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().

The following parts came from John Dyson:

Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.

Move the upages object (upobj) from the vmspace to the proc structure.

Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.


# 24690 07-Apr-1997 peter

No longer use an i386tss as the basis of our pcb - it wasn't particularly
convenient and makes life difficult for my next commit. We still need
an i386tss to point to for the tss slot in the gdt, so we use a common
tss shared between all processes.

Note that this is going to break debugging until this series of commits
is finished. core dumps will change again too. :-( we really need
a more modern core dump format that doesn't depend on the pcb/upages.

This change makes VM86 mode harder, but the following commits will remove
a lot of constraints for the VM86 system, including the possibility of
extending the pcb for an IO port map etc.

Obtained from: bde


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 18892 12-Oct-1996 bde

Removed nested include if <sys/socket.h> from <net/if.h> and
<net/if_arp.h> and fixed the things that depended on it. The nested
include just allowed unportable programs to compile and made my
simple #include checking program report that networking code doesn't
need to include <sys/socket.h>.


# 17371 31-Jul-1996 bde

Eliminated pcb_inl. It was always 0 because context switches don't occur
in interrupt handlers.


# 17366 31-Jul-1996 dg

Converted timer/run queues to 4.4BSD queue style. Removed old and unused
sleep(). Implemented wakeup_one() which may be used in the future to combat
the "thundering herd" problem for some special cases.

Reviewed by: dyson


# 15565 02-May-1996 phk

Move atdevbase out of locore.s and into machdep.c
Macroize locore.s' page table setup even more, now it's almost readable.
Rename PG_U to PG_A (so that I can...)
Rename PG_u to PG_U. "PG_u" was just too ugly...
Remove some unused vars in pmap.c
Remove PG_KR and PG_KW
Remove SSIZE
Remove SINCR
Remove BTOPKERNBASE

This concludes my spring cleaning, modulus any bug fixes for messes I
have made on the way.

(Funny to be back here in pmap.c, that's where my first significant
contribution to 386BSD was... :-)


# 15543 02-May-1996 phk

removed:
CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei()
ptei() kvtopte() ptetov() ispt() ptetoav() &c &c
new:
NPDEPG

Major macro cleanup.


# 15231 13-Apr-1996 bde

Generate #define of PCB_SAVEFPU_SIZE for use in savectx().


# 14836 27-Mar-1996 bde

Eliminated dependency on opt_sysvipc.h.


# 14595 12-Mar-1996 dg

Killed some historical #define cruft that we've never used in FreeBSD:

UDOT_SZ
SYSPTSIZE
USRPTSIZE
MSGBUFPTECNT
DMMIN
DMMAX
DMTEXT
USRIOSIZE
VM_PHYS_SIZE


# 13226 04-Jan-1996 wollman

Convert SYSV IPC to new-style options. (I hope I got everything...)
The LKMs will need an extra file, to come later.


# 12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


# 12607 03-Dec-1995 bde

Completed function declarations and/or added prototypes.


# 10092 17-Aug-1995 dg

Killed some unused stuff inherited from Bill Jolitz. Note that since
this changes the size of the pcb struct, gdb will need to be rebuilt
or debugging won't work correctly.

Reviewed by: Bruce Evans


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 8748 25-May-1995 dg

Made "NMBCLUSTERS" calculation dynamic and fixed bogus use of "NMBCLUSTERS"
in machdep.c (it should use the global nmbclusters). Moved the calculation
of nmbclusters into conf/param.c (same place where nmbclusters has always
been assigned), and made the calculation include an extra amount based
on "maxusers". NMBCLUSTERS can still be overrided in the kernel config
file as always, but this change will make that generally unnecessary. This
fixes the "bug" reports from people who have misconfigured kernels seeing
the network hang when the mbuf cluster pool runs out.

Reviewed by: John Dyson


# 6377 14-Feb-1995 phk

Removed a YF comment.


# 6354 14-Feb-1995 phk

Yves has sent us a ~600 Kb patch, which shuts up gcc entirely for the
entire kernel.
Unfortunately we didn't send him a copy of the style guide before he did it.
I'm trying to find all the benign and downright sound bits and will commit
them without any other explanation than "YF fix" if they are merely cosmetic.

Reviewed by: phk
Submitted by: yves@dutncp8.tn.tudelft.nl (Yves Fonk)


# 5908 25-Jan-1995 bde

Load the kernel symbol table in the boot loader and not at compile time.
(Boot with the -D flag if you want symbols.)

Make it easier to extend `struct bootinfo' without losing either forwards
or backwards compatibility.

ddb_aout.c:
Get the symbol table from wherever the loader put it.
Nuke db_symtab[SYMTAB_SPACE].

boot.c:
Enable loading of symbols. Align them on a page boundary. Add printfs
about the symbol table sizes.
Pass the memory sizes to the kernel.
Fix initialization of `unit' (it got moved out of the loop).
Fix adding the bss size (it got moved inside an ifdef).
Initialize serial port when RB_SERIAL is toggled on.
Fix comments.
Clean up formatting of recently added code.

io.c:
Clean up formatting of recently added code.

netboot/main.c, machdep.c, wd.c:
Change names of bootinfo fields.

LINT:
Nuke SYMTAB_SPACE.
Fix comment about DODUMP.

Makefile.i386:
Nuke use of dbsym.
Exclude gcc symbols from kernel unless compiling with -g.
Remove unused macro.
Fix comments and formatting.

genassym.c:
Generate defines for some new bootinfo fields. Change names of old ones.

locore.s:
Copy only the valid part of the `struct bootinfo' passed by the loader.
Reserve space for symbol table, if any.

machdep.c:
Check the memory sizes passed by the loader, if any. Don't use them yet.

bootinfo.h:
Add a size field so that we can resolve some mismatches between the loader
bootinfo and the kernel boot info. The version number is not so good for
this because of historical botches and because it's harder to maintain.
Add memory size and symbol table fields. Change the names of everything.

Hacks to save a few bytes:

asm.S, boot.c, boot2.S:
Replace `ouraddr' by `(BOOTSEG << 4)'.

boot.c:
Don't statically initialize `loadflags' to 0. Disable the "REDUNDANT"
code that skips the BIOS variables. Eliminate `total'. Combine some
more printfs.

boot.h, disk.c, io.c, table.c:
Move all statically initialzed data to table.c.

io.c:
Don't put the A20 gate bits in a variable.


# 5770 21-Jan-1995 bde

Remove unused definitions of vm statistics counters. Most of the
counting is now done in C. There are still about 100 unused
definitions for other things.


# 4929 03-Dec-1994 bde

i386/exception.s,
Keep track of interrupt nesting level. It is normally 0
for syscalls and traps, but is fudged to 1 for their exit
processing in case they metamorphose into an interrupt
handler.

i386/genassym.c;
Remove support for the obsolete pcb_iml and pcb_cmap2.

Add support for pcb_inl.

i386/swtch.s:
Fudge the interrupt nesting level across context switches and in
the idle loop so that the work for preemptive context switches
gets counted as interrupt time, the work for voluntary context
switches gets counted mostly as system time (the part when
curproc == 0 gets counted as interrupt time), and only truly idle
time gets counted as idle time.

Remove obsolete support (commented out and otherwise) for pcb_iml.

Load curpcb just before curproc instead of just after so that
curpcb is always valid if curproc is. A few more changes like
this may fix tracing through context switches.

Remove obsolete function swtch_to_inactive().

include/cpu.h:
Use the new interrupt nesting level variable to implement a
non-fake CLF_INTR() so that accounting for the interrupt state
works.

You can use top, iostat or (best) an up to date systat to see
interrupt overheads. I see the expected huge interrupt overheads
for ISA devices (on a 486DX/33, about 55% for an IDE drive
transferring 1250K/sec and the same for a WD8013EBT network card
transferring 1100K/sec). The huge interrupt overheads for serial
devices are unfortunately normally invisible.

include/pcb.h:
Remove the obsolete pcb_iml and pcb_cmap2. Replace them by
padding to preserve binary compatibility.

Use part of the new padding for pcb_inl.

isa/icu.s:
isa/vector.s:
Keep track of interrupt nesting level.


# 4600 18-Nov-1994 phk

Grap the bootinfo structure the bootblock passes us.


# 3919 26-Oct-1994 bde

Fix compiler warnings.


# 3889 26-Oct-1994 jkh

Fix two very minor nits, one of which caused a warning (no return type for
main).


# 3612 15-Oct-1994 dg

1) Some of the counters in the vmmeter struct don't fit well into the Mach VM
scheme of things, so I've changed them to be more appropriate. page in/ous
are now associated with the pager that did them. Nuked v_fault as the
only fault of interest that wouldn't be already counted in v_trap is a VM
fault, and this is counted seperately.
2) Implemented most of the remaining counters and corrected the counting of
some that were done wrong. They are all almost correct now...just a few
minor ones left to fix.


# 3451 09-Oct-1994 dg

Got rid of map.h. It's a leftover from the rmap code, and we use rlists.
Changed swapmap into swaplist.


# 3384 06-Oct-1994 rgrimes

1. Eliminate unused esym global from locore, our boot code never supported
that and when it does it will be done differently.

2. The kernel now does a frame setup on entry so it ``looks'' like a
real function call. This will be needed by future boot code and
debuggers.

3. Clean up stack offsets to all be in decimal and use %ebp when copying
parameters in from the boot code.

4. Implement version 1 of the uniform boot code passing mechanism with
support for kernelname passing and nfs_diskless structure passing.

5. Document the 3 different ways the kernel is called depending on what code
is calling it.


# 3294 02-Oct-1994 rgrimes

If you are building a kernel without NFS statically linked in you
must #define NFS before including <sys/mount.h> to pick up some of
the definitions needed for struct diskless. Be sure to undef it after this
so you do not effect other code.

This is kinda sick, but it does the job. Problem found by davidg.


# 3291 02-Oct-1994 dg

"idle priority" support. Based on code from Henrik Vestergaard Draboel,
but substantially rewritten by me.


# 3283 01-Oct-1994 rgrimes

Add code to generate NFSDISKLESS_SIZE for use in locore for copying the
nfs_diskless structure in from the boot code.
Reviewed by: davidg


# 2689 12-Sep-1994 dg

Eliminated a whole pile of ancient (we're taking 4.3BSD) VM system
related #define constants. Corrected incorrect VM_MAX_KERNEL_ADDRESS.

Reviewed by: John Dyson


# 2457 02-Sep-1994 dg

Converted P_LINK -> P_FORW, P_RLINK -> P_BACK, minor optimization.


# 2441 01-Sep-1994 dg

Realtime priority scheduling support.

Submitted by: Henrik Vestergaard Draboel


# 1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 974 14-Jan-1994 dg

"New" VM system from John Dyson & myself. For a run-down of the
major changes, see the log of any effected file in the sys/vm
directory (swap_pager.c for instance).


# 757 13-Nov-1993 dg

First steps in rewriting locore.s, and making info useful
when the machine panics.

i386/i386/locore.s:
1) got rid of most .set directives that were being used like
#define's, and replaced them with appropriate #define's in
the appropriate header files (accessed via genassym).
2) added comments to header inclusions and global definitions,
and global variables
3) replaced some hardcoded constants with cpp defines (such as
PDESIZE and others)
4) aligned all comments to the same column to make them easier to
read
5) moved macro definitions for ENTRY, ALIGN, NOP, etc. to
/sys/i386/include/asmacros.h
6) added #ifdef BDE_DEBUGGER around all of Bruce's debugger code
7) added new global '_KERNend' to store last location+1 of kernel
8) cleaned up zeroing of bss so that only bss is zeroed
9) fix zeroing of page tables so that it really does zero them all
- not just if they follow the bss.
10) rewrote page table initialization code so that 1) works correctly
and 2) write protects the kernel text by default
11) properly initialize the kernel page directory, upages, p0stack PT,
and page tables. The previous scheme was more than a bit
screwy.
12) change allocation of virtual area of IO hole so that it is
fixed at KERNBASE + 0xa0000. The previous scheme put it
right after the kernel page tables and then later expected
it to be at KERNBASE +0xa0000
13) change multiple bogus settings of user read/write of various
areas of kernel VM - including the IO hole; we should never
be accessing the IO hole in user mode through the kernel
page tables
14) split kernel support routines such as bcopy, bzero, copyin,
copyout, etc. into a seperate file 'support.s'
15) split swtch and related routines into a seperate 'swtch.s'
16) split routines related to traps, syscalls, and interrupts
into a seperate file 'exception.s'
17) remove some unused global variables from locore that got
inserted by Garrett when he pulled them out of some .h
files.

i386/isa/icu.s:
1) clean up global variable declarations
2) move in declaration of astpending and netisr

i386/i386/pmap.c:
1) fix calculation of virtual_avail. It previously was calculated
to be right in the middle of the kernel page tables - not
a good place to start allocating kernel VM.
2) properly allocate kernel page dir/tables etc out of kernel map
- previously only took out 2 pages.

i386/i386/machdep.c:
1) modify boot() to print a warning that the system will reboot in
PANIC_REBOOT_WAIT_TIME amount of seconds, and let the user
abort with a key on the console. The machine will wait for
ever if a key is typed before the reboot. The default is
15 seconds, but can be set to 0 to mean don't wait at all,
-1 to mean wait forever, or any positive value to wait for
that many seconds.
2) print "Rebooting..." just before doing it.

kern/subr_prf.c:
1) remove PANICWAIT as it is deprecated by the change to machdep.c

i386/i386/trap.c:
1) add table of trap type strings and use it to print a real trap/
panic message rather than just a number. Lot's of work to
be done here, but this is the first step. Symbolic traceback
is in the TODO.

i386/i386/Makefile.i386:
1) add support in to build support.s, exception.s and swtch.s

...and various changes to various header files to make all of the
above happen.


# 608 15-Oct-1993 rgrimes

genassym.c:
Remove NKMEMCLUSTERS, it is no longer define or used.

locores.s:
Fix comment on PTDpde and APTDpde to be pde instead of pte
Add new equation for calculating location of Sysmap
Remove Bill's old #ifdef garbage for counting up memory,
that stuff will never be made to work and was just cluttering
up the file.

Add code that places the PTD, page table pages, and kernel
stack below the 640k ISA hole if there is room for it, otherwise
put this stuff all at 1MB. This fixes the 28K bogusity in
the boot blocks, that can now go away!

Fix the caclulation of where first is to be dependent on
NKPDE so that we can skip over the above mentioned areas.
The 28K thing is now 44K in size due to the increase in
kernel virtual memory space, but since we no longer have
to worry about that this is no big deal.

Use if NNPX > 0 instead of ifdef NPX for floating point code.

machdep.c
Change the calculation of for the buffer cache to be
20% of all memory above 2MB and add back the upper limit
of 2/5's of the VM_KMEM_SIZE so that we do not eat ALL
of the kernel memory space on large memory machines, note
that this will not even come into effect unless you have
more than 32MB. The current buffer cache limit is 6.7MB
due to this caclulation.

It seems that we where erroniously allocating bufpages pages
for buffer_map. buffer_map is UNUSED in this implementation
of the buffer cache, but since the map is referenced in
several if statements a quick fix was to simply allocate
1 vm page (but no real memory) to it.

pmap.h
Remove rcsid, don't want them in the kernel files!

Removed some cruft inside an #ifdef DEBUGx that caused
compiler errors if you where compiling this for debug.

Use the #defines for PD_SHIFT and PG_SHIFT in place of
constants.

trap.c:
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.

Remove a now completly invalid check for a maximum virtual
address, the virtual address now ends at 0xFFFFFFFF so
there is no more MAX!! (Thanks David, I completly missed
that one!)

vm_machdep.c
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.

Replace several 0xFE00000 constants with KERNBASE


# 590 12-Oct-1993 rgrimes

Add Page Table Directory Indexes (NKPDE, KPTDI, PTDPTDI, APTDPTDI) to
be used to replace more constants in locore.


# 567 10-Oct-1993 rgrimes

Added PDRSHIFT and KERNSIZE so that the PDR offsets can be calculated in
locore.s instead of being constants (3F8, 3FA).


# 556 08-Oct-1993 rgrimes

All:
removed patch kit headers and sccsids, add $Id$. This is a general
clean up and reallignment with NetBSD-current where possible.

genassym.c:
removed extranious include of reg.h
removed old FP_* defines that have been ifdefed out since the patch kit
removed PCB_SIGC that is not referenced anywhere
add trapframe and sigframe defines
add KERNBASE define for use in locore.s

locore.s:
include npx.h and use NNPX for turning on and off FPU
include machine/cputypes.h for the types of cpu (used in cpu_identify)
change SYSPDREND to be one higher, this is really the base of the
next area, and will be changing again next time I revise the file
Reverse the NOP defines, you now get slow NOP's by default, this
may be what is casuing us trouble with some systems. If you want
the NOPS to be null you now need to have options DUMMY_NOPS.
Now get esym from the boot blocks which don't pass it yet, and
it is not used, but this will be changing.
Move the bit_colors stuff to be in with the rest of Bruces SHOW_A_LOT
things for debugging.
Added NetBSD's CPU type probe code, we now know what type of CPU
we are running on.
Adjust kernel pde calcuation to correct for change in SYSPDREND, no
longer need the +1.

machdep.c
include npx.h and use NNPX for turning on and off FPU
include isa.h, map.h(new file), exec.h in preperation for
changes that are still in process.
Add some of the code for MACHINE_NONCONTIG that will alow us
to better map around the BIOS memory area.
Now print the version, cpu id, real memory and availiable memory
during boot.
Correct the calculation of bufpages, the code was mixing pages
and bytes, it now does the right things. Removed Bill's hack
for limiting the erronous calculation.
add the identifycpu print out code from NetBSD.
remove the definition of the sigframe struct, it belongs in
frame.h
put in printf's about syncing disks on a halt/reboot.
Change the halted message to be a little easier reading.
Clean up of the dump messages, makes the source and the output
much more readable.
Change 0,0 in several places to have spaces after the commas.


# 5 12-Jun-1993 rgrimes

This commit was generated by cvs2svn to compensate for changes in r4,
which included commits to RCS files with non-trunk default branches.


# 4 12-Jun-1993 rgrimes

Initial import, 0.1 + pk 0.2.4-B1