Cross Reference: /freebsd-current/sys/amd64/amd64/cpu

History log of /freebsd-current/sys/amd64/amd64/cpu_switch.S
Revision	Date	Author	Comments
# 95ee2897	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: two-line .h pattern Remove /^\s\\n \*\s+\$FreeBSD\$$\n/
# df013409	03-Oct-2020	Konstantin Belousov <kib@FreeBSD.org>	amd64: Store full 64bit of FIP/FDP for 64bit processes when using XSAVE. If current process is 64bit, use rex-prefixed version of XSAVE (XSAVE64). If current process is 32bit and CPU supports saving segment registers cs/ds in the FPU save area, use non-prefixed variant of XSAVE. Reported and tested by: Michał Górny <mgorny@mgorny@moritz.systems> PR: 250043 Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D26643
# f446480b	23-Aug-2020	Konstantin Belousov <kib@FreeBSD.org>	amd64: Handle 5-level paging on wakeup. We can switch into long mode directly with LA57 enabled. Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25273
# ea602083	20-May-2020	Konstantin Belousov <kib@FreeBSD.org>	amd64: Add a knob to flush RSB on context switches if machine has SMEP. The flush is needed to prevent cross-process ret2spec, which is not handled on kernel entry if IBPB is enabled but SMEP is present. While there, add i386 RSB flush. Reported by: Anthony Steinhauser <asteinhauser@google.com> Reviewed by: markj, Anthony Steinhauser Discussed with: philip admbugs: 961 Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 98158c75	10-Nov-2019	Konstantin Belousov <kib@FreeBSD.org>	amd64: move common_tss into pcpu. This saves some memory, around 256K I think. It removes some code, e.g. KPTI does not need to specially map common_tss anymore. Also, common_tss become domain-local. Reviewed by: jhb Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22231
# 5e921ff4	25-Oct-2019	Konstantin Belousov <kib@FreeBSD.org>	amd64: move pcb out of kstack to struct thread. This saves 320 bytes of the precious stack space. The only negative aspect of the change I can think of is that the struct thread increased by 320 bytes obviously, and that 320 bytes are not swapped out anymore. I believe the freed stack space is much more important than that. Also, current struct thread size is 1392 bytes on amd64, so UMA will allocate two thread structures per (4KB) slab, which leaves a space for pcb without increasing zone memory use. Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D22138
# 73dd84a7	28-Aug-2019	Mateusz Guzik <mjg@FreeBSD.org>	amd64: clean up cpu_switch.S - LK macro (conditional on SMP for the lock prefix) is unused - SETLK unnecessarily performs xchg. obtained value is never used and the implicit lock prefix adds avoidable cost. Barrier provided by it does not appear to be of any use. - the lock waited for is almost never blocked, yet the loop starts with a pause. Move it out of the common case. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19563
# a9262f49	16-Mar-2019	Konstantin Belousov <kib@FreeBSD.org>	amd64: rewrite cpu_switch.S fragment to reload tss.rsp0 on context switch. New code avoids jumps. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19514
# d1a07e31	13-Jun-2018	Konstantin Belousov <kib@FreeBSD.org>	Enable eager FPU context switch by default on amd64. With compilers making increasing use of vector instructions the performance benefit of lazily switching FPU state is no longer a desirable tradeoff. Linux switched to eager FPU context switch some time ago, and the idea was floated on the FreeBSD-current mailing list some years ago[1]. Enable eager FPU context switch by default on amd64, with a tunable/sysctl available to turn it back off. [1] https://lists.freebsd.org/pipermail/freebsd-current/2015-March/055198.html Reviewed by: jhb Tested by: pho Sponsored by: The FreeBSD Foundation
# 27275f8a	26-Apr-2018	Tycho Nightingale <tychon@FreeBSD.org>	Expand the checks for UCR3 == PMAP_NO_CR3 to enable processes to be excluded from PTI. Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15100
# fc2a8776	20-Mar-2018	Ed Maste <emaste@FreeBSD.org>	Rename assym.s to assym.inc assym is only to be included by other .s files, and should never actually be assembled by itself. Reviewed by: imp, bdrewery (earlier) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D14180
# bd50262f	17-Jan-2018	Konstantin Belousov <kib@FreeBSD.org>	PTI for amd64. The implementation of the Kernel Page Table Isolation (KPTI) for amd64, first version. It provides a workaround for the 'meltdown' vulnerability. PTI is turned off by default for now, enable with the loader tunable vm.pmap.pti=1. The pmap page table is split into kernel-mode table and user-mode table. Kernel-mode table is identical to the non-PTI table, while usermode table is obtained from kernel table by leaving userspace mappings intact, but only leaving the following parts of the kernel mapped: kernel text (but not modules text) PCPU GDT/IDT/user LDT/task structures IST stacks for NMI and doublefault handlers. Kernel switches to user page table before returning to usermode, and restores full kernel page table on the entry. Initial kernel-mode stack for PTI trampoline is allocated in PCPU, it is only 16 qwords. Kernel entry trampoline switches page tables. then the hardware trap frame is copied to the normal kstack, and execution continues. IST stacks are kept mapped and no trampoline is needed for NMI/doublefault, but of course page table switch is performed. On return to usermode, the trampoline is used again, iret frame is copied to the trampoline stack, page tables are switched and iretq is executed. The case of iretq faulting due to the invalid usermode context is tricky, since the frame for fault is appended to the trampoline frame. Besides copying the fault frame and original (corrupted) frame to kstack, the fault frame must be patched to make it look as if the fault occured on the kstack, see the comment in doret_iret detection code in trap(). Currently kernel pages which are mapped during trampoline operation are identical for all pmaps. They are registered using pmap_pti_add_kva(). Besides initial registrations done during boot, LDT and non-common TSS segments are registered if user requested their use. In principle, they can be installed into kernel page table per pmap with some work. Similarly, PCPU can be hidden from userspace mapping using trampoline PCPU page, but again I do not see much benefits besides complexity. PDPE pages for the kernel half of the user page tables are pre-allocated during boot because we need to know pml4 entries which are copied to the top-level paging structure page, in advance on a new pmap creation. I enforce this to avoid iterating over the all existing pmaps if a new PDPE page is needed for PTI kernel mappings. The iteration is a known problematic operation on i386. The need to flush hidden kernel translations on the switch to user mode make global tables (PG_G) meaningless and even harming, so PG_G use is disabled for PTI case. Our existing use of PCID is incompatible with PTI and is automatically disabled if PTI is enabled. PCID can be forced on only for developer's benefit. MCE is known to be broken, it requires IST stack to operate completely correctly even for non-PTI case, and absolutely needs dedicated IST stack because MCE delivery while trampoline did not switched from PTI stack is fatal. The fix is pending. Reviewed by: markj (partially) Tested by: pho (previous version) Discussed with: jeff, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 4275e16f	10-Jan-2018	Konstantin Belousov <kib@FreeBSD.org>	Rename COMMON_TSS_RSP0 to TSS_RSP0. The symbol is just an offset in the hardware TSS structure, it is not limited to the common_tss instance. Sponsored by: The FreeBSD Foundation MFC after: 3 days
# 843d5752	05-Oct-2017	Konstantin Belousov <kib@FreeBSD.org>	Update comment to note that we skip LDT reload for kthreads as well. Noted by: bde Sponsored by: The FreeBSD Foundation MFC after: 3 days
# 0d4e7ec5	26-Aug-2017	Ryan Libby <rlibby@FreeBSD.org>	amd64: drop q suffix from rd[fg]sbase for gas compatibility Reviewed by: kib Approved by: markj (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12133
# 3e902b3d	21-Aug-2017	Konstantin Belousov <kib@FreeBSD.org>	Make WRFSBASE and WRGSBASE instructions functional. Right now, we enable the CR4.FSGSBASE bit on CPUs which support the facility (Ivy and later), to allow usermode to read fs and gs bases without syscalls. This bit also controls the write access to bases from userspace, but WRFSBASE and WRGSBASE instructions currently cannot be used, because return path from both exceptions or interrupts overrides bases with the values from pcb. Supporting the instructions is useful because this means that usermode can implement green-threads completely in userspace without issuing syscalls to change all of the machine context. Support is implemented by saving the fs base and user gs base when PCB_FULL_IRET flag is set. The flag is set on the context switch, which potentially causes clobber of the bases due to activation of another context, and when explicit modification of the user context by a syscall or exception handler is performed. In particular, the patch moves setting of the flag before syscalls change context. The changes to doreti_exit and PUSH_FRAME to clear PCB_FULL_IRET on entry from userspace can be considered a bug fixes on its own. Reviewed by: jhb (previous version) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D12023
# bd101a66	15-May-2017	Konstantin Belousov <kib@FreeBSD.org>	Ensure that resume path on amd64 only accesses page tables for normal operation after processor is configured to allow all required features. In particular, NX must be enabled in EFER, otherwise load of page table element with nx bit set causes reserved bit page fault. Since malloc uses direct mapping for small allocations, in particular for the suspension pcbs, and DMAP is nx after r316767, this commit tripped fault on resume path. Restore complete state of EFER while wakeup code is still executing with custom page table, before calling resumectx, instead of trying to guess which features might be needed before resumectx restored EFER on its own. Bisected and tested by: trasz Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# fbbd9655	28-Feb-2017	Warner Losh <imp@FreeBSD.org>	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
# a546448b	09-May-2015	Konstantin Belousov <kib@FreeBSD.org>	Rewrite amd64 PCID implementation to follow an algorithm described in the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency Algorithms". The same algorithm is already utilized by the MIPS pmap to handle ASIDs. The PCID for the address space is now allocated per-cpu during context switch to the thread using pmap, when no PCID on the cpu was ever allocated, or the current PCID is invalidated. If the PCID is reused, bit 63 of %cr3 can be set to avoid TLB flush. Each cpu has PCID' algorithm generation count, which is saved in the pmap pcpu block when pcpu PCID is allocated. On invalidation, the pmap generation count is zeroed, which signals the context switch code that already allocated PCID is no longer valid. The implication is the TLB shootdown for the given cpu/address space, due to the allocation of new PCID. The pm_save mask is no longer has to be tracked, which (significantly) reduces the targets of the TLB shootdown IPIs. Previously, pm_save was reset only on pmap_invalidate_all(), which made it accumulate the cpuids of all processors on which the thread was scheduled between full TLB shootdowns. Besides reducing the amount of TLB shootdowns and removing atomics to update pm_saves in the context switch code, the algorithm is much simpler than the maintanence of pm_save and selection of the right address space in the shootdown IPI handler. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
# b1d735ba	06-Sep-2014	John Baldwin <jhb@FreeBSD.org>	Create a separate structure for per-CPU state saved across suspend and resume that is a superset of a pcb. Move the FPU state out of the pcb and into this new structure. As part of this, move the FPU resume code on amd64 into a C function. This allows resumectx() to still operate only on a pcb and more closely mirrors the i386 code. Reviewed by: kib (earlier version)
# 1d22d877	04-Mar-2014	Jung-uk Kim <jkim@FreeBSD.org>	Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c. Inspired by: jhb
# 603bc162	04-Mar-2014	Jung-uk Kim <jkim@FreeBSD.org>	Properly save and restore CR0. MFC after: 3 days
# 05acaa9f	04-Mar-2014	Jung-uk Kim <jkim@FreeBSD.org>	Remove dead code since r230426, fix a comment, and tidy up. Reported by: jhb MFC after: 3 days
# 37eed841	30-Aug-2013	Konstantin Belousov <kib@FreeBSD.org>	Implement support for the process-context identifiers ('PCID') on Intel CPUs. The feature tags TLB entries with the Id of the address space and allows to avoid TLB invalidation on the context switch, it is available only in the long mode. In the microbenchmarks, using the PCID decreased latency of the context switches by ~30% on SandyBridge class desktop CPUs, measured with the lat_ctx program from lmbench. If available, use INVPCID instruction when a TLB entry in non-current address space needs to be invalidated. The instruction is typically available on the Haswell. If needed, the use of PCID can be turned off with the vm.pmap.pcid_enabled loader tunable set to 0. The state of the feature is reported by the vm.pmap.pcid_enabled sysctl. The sysctl vm.pmap.pcid_save_cnt reports the number of context switches which avoided invalidating the TLB; compare with the total number of context switches, available as sysctl vm.stats.sys.v_swtch. Sponsored by: The FreeBSD Foundation Reviewed by: alc Tested by: pho, bf
# 333d0c60	14-Jul-2012	Konstantin Belousov <kib@FreeBSD.org>	Add support for the XSAVEOPT instruction use. Our XSAVE/XRSTOR usage mostly meets the guidelines set by the Intel SDM: 1. We use XRSTOR and XSAVE from the same CPL using the same linear address for the store area 2. Contrary to the recommendations, we cannot zero the FPU save area for a new thread, since fork semantic requires the copy of the previous state. This advice seemingly contradicts to the advice from the item 6. 3. We do use XSAVEOPT in the context switch code only, and the area for XSAVEOPT already always contains the data saved by XSAVE. 4. We do not modify the save area between XRSTOR, when the area is loaded into FPU context, and XSAVE. We always spit the fpu context into save area and start emulation when directly writing into FPU context. 5. We do not use segmented addressing to access save area, or rather, always address it using %ds basing. 6. XSAVEOPT can be only executed in the area which was previously loaded with XRSTOR, since context switch code checks for FPU use by outgoing thread before saving, and thread which stopped emulation forcibly get context loaded with XRSTOR. 7. The PCB cannot be paged out while FPU emulation is turned off, since stack of the executing thread is never swapped out. The context switch code is patched to issue XSAVEOPT instead of XSAVE if supported. This approach eliminates one conditional in the context switch code, which would be needed otherwise. For user-visible machine context to have proper data, fpugetregs() checks for unsaved extension blocks and manually copies pristine FPU state into them, according to the description provided by CPUID leaf 0xd. MFC after: 1 month
# f18d5bf4	06-Jul-2012	Konstantin Belousov <kib@FreeBSD.org>	Use assembler mnemonic instead of manually assembling, contination for r238142. Reviewed by: jhb MFC after: 1 month
# 6ad79910	13-Jun-2012	Jung-uk Kim <jkim@FreeBSD.org>	- Remove unused code for CR3 and CR4. - Fix few style(9) nits while I am here.
# acd7df97	13-Jun-2012	Jung-uk Kim <jkim@FreeBSD.org>	- Fix resumectx() prototypes to reflect reality. - For i386, simply jump to resumectx() with PCB in %ecx. - Fix a style(9) nit while I am here.
# fb864578	08-Jun-2012	Mitsuru IWASAKI <iwasaki@FreeBSD.org>	Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of suspend/resume procedures are minimized among them. common: - Add global cpuset suspended_cpus to indicate APs are suspended/resumed. - Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used). - Add some variables in acpi_wakecode.S in order to minimize the difference among amd64 and i386. - Disable load_cr3() because now CR3 is restored in resumectx(). amd64: - Add suspend/resume related members (such as MSR) in PCB. - Modify savectx() for above new PCB members. - Merge acpi_switch.S into cpu_switch.S as resumectx(). i386: - Merge(and remove) suspendctx() into savectx() in order to match with amd64 code. Reviewed by: attilio@, acpi@
# 49a30208	27-Feb-2012	John Baldwin <jhb@FreeBSD.org>	Update incorrect comment.
# 8c6f8f3d	21-Jan-2012	Konstantin Belousov <kib@FreeBSD.org>	Add support for the extended FPU states on amd64, both for native 64bit and 32bit ABIs. As a side-effect, it enables AVX on capable CPUs. In particular: - Query the CPU support for XSAVE, list of the supported extensions and the required size of FPU save area. The hw.use_xsave tunable is provided for disabling XSAVE, and hw.xsave_mask may be used to select the enabled extensions. - Remove the FPU save area from PCB and dynamically allocate the (run-time sized) user save area on the top of the kernel stack, right above the PCB. Reorganize the thread0 PCB initialization to postpone it after BSP is queried for save area size. - The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as well. FPU state is only useful for suspend, where it is saved in dynamically allocated suspfpusave area. - Use XSAVE and XRSTOR to save/restore FPU state, if supported and enabled. - Define new mcontext_t flag _MC_HASFPXSTATE, indicating that mcontext_t has a valid pointer to out-of-struct extended FPU state. Signal handlers are supplied with stack-allocated fpu state. The sigreturn(2) and setcontext(2) syscall honour the flag, allowing the signal handlers to inspect and manipilate extended state in the interrupted context. - The getcontext(2) never returns extended state, since there is no place in the fixed-sized mcontext_t to place variable-sized save area. And, since mcontext_t is embedded into ucontext_t, makes it impossible to fix in a reasonable way. Instead of extending getcontext(2) syscall, provide a sysarch(2) facility to query extended FPU state. - Add ptrace(2) support for getting and setting extended state; while there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries. - Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to consumers, making it opaque. Internally, struct fpu_kern_ctx now contains a space for the extended state. Convert in-kernel consumers of fpu_kern KPI both on i386 and amd64. First version of the support for AVX was submitted by Tim Bird <tim.bird am sony com> on behalf of Sony. This version was written from scratch. Tested by: pho (previous version), Yamagi Burmeister <lists yamagi org> MFC after: 1 month
# 50e3cec3	22-Dec-2010	Jung-uk Kim <jkim@FreeBSD.org>	Increase size of pcb_flags to four bytes. Requested by: bde, jhb
# e6c006d9	21-Dec-2010	Jung-uk Kim <jkim@FreeBSD.org>	Improve PCB flags handling and make it more robust. Add two new functions for manipulating pcb_flags. These inline functions are very similar to atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK prefix for SMP. Add comments about the rationale[1]. Use these functions wherever possible. Although there are some places where it is not strictly necessary (e.g., a PCB is copied to create a new PCB), it is done across the board for sake of consistency. Turn pcb_full_iret into a PCB flag as it is safe now. Move rarely used fields before pcb_flags and reduce size of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in the neighborhood. Reviewed by: kib Submitted by: kib[1] MFC after: 2 months
# cfe92f33	24-Nov-2010	Dimitry Andric <dim@FreeBSD.org>	Change ambiguous (or invalid, depending on how strict you want to be :) assembly instruction "movw %rcx,2(%rax)" to "movw %cx,2(%rax)", since the intent was to move 16 bits of data, in this case. Found by: clang Reviewed by: kib
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 305c5c0a	30-Aug-2010	Jung-uk Kim <jkim@FreeBSD.org>	Save MSR_FSBASE, MSR_GSBASE and MSR_KGSBASE directly to PCB as we do not use these values in the function.
# 3ab42a25	03-Aug-2010	Jung-uk Kim <jkim@FreeBSD.org>	savectx() has not been used for fork(2) for about 15 years. [1] Do not clobber FPU thread's PCB as it is more harmful. When we resume CPU, unconditionally reload FPU state. Pointed out by: bde [1]
# a2d2c836	02-Aug-2010	Jung-uk Kim <jkim@FreeBSD.org>	- Merge savectx2() with savectx() and struct xpcb with struct pcb. [1] savectx() is only used for panic dump (dumppcb) and kdb (stoppcbs). Thus, saving additional information does not hurt and it may be even beneficial. Unfortunately, struct pcb has grown larger to accommodate more data. Move 512-byte long pcb_user_save to the end of struct pcb while I am here. - savectx() now saves FPU state unconditionally and copy it to the PCB of FPU thread if necessary. This gives panic dump and kdb a chance to take a look at the current FPU state even if the FPU is "supposedly" not used. - Resuming CPU now unconditionally reinitializes FPU. If the saved FPU state was irrelevant, it could be in an unknown state. Suggested by: bde [1]
# 9727ca6a	29-Jul-2010	Jung-uk Kim <jkim@FreeBSD.org>	Fix another fallout from r208833. savectx() is used to save CPU context for crash dump (dumppcb) and kdb (stoppcbs). For both cases, there cannot have a valid pointer in pcb_save. This should restore the previous behaviour.
# 39381048	29-Jul-2010	Jung-uk Kim <jkim@FreeBSD.org>	Rename PCB_USER_FPU to PCB_USERFPU not to clash with a macro from fpu.h.
# 9bfb10b1	26-Jul-2010	Jung-uk Kim <jkim@FreeBSD.org>	Re-implement FPU suspend/resume for amd64. This removes superfluous uses of critical_enter(9) and critical_exit(9) by fpugetregs() and fpusetregs(). Also, we do not touch PCB flags any more. MFC after: 1 month
# aaa95ccb	12-Jul-2010	Konstantin Belousov <kib@FreeBSD.org>	When switching the thread from the processor, store %dr7 content into the pcb before disabling watchpoints. Otherwise, when the thread is restored on a processor, watchpoints are still disabled. Submitted by: Tijl Coosemans <tijl coosemans org> (I would be much happier if Tijl commited this himself) MFC after: 1 week
# 77cb6e6f	07-Jul-2010	Alan Cox <alc@FreeBSD.org>	Correctly maintain the per-cpu field "curpmap" on amd64 just like we do on i386. The consequences of not doing so on amd64 became apparent with the introduction of the COUNT_IPIS and COUNT_XINVLTLB_HITS options. Specifically, single-threaded applications were generating unnecessary IPIs to shoot-down the TLB on other processors. However, this is clearly nonsensical because a single-threaded application is only running on the current processor. The reason that this happens is that pmap_activate() is unable to properly update the old pmap's field "pm_active" without the correct "curpmap". So, in effect, stale bits in "pm_active" were leading pmap_protect(), pmap_remove(), pmap_remove_pages(), etc. to flush the TLB contents on some arbitrary processor that wasn't even running the same application. Reviewed by: kib MFC after: 3 weeks
# 6cf9a08d	05-Jun-2010	Konstantin Belousov <kib@FreeBSD.org>	Introduce the x86 kernel interfaces to allow kernel code to use FPU/SSE hardware. Caller should provide a save area that is chained into the stack of the areas; pcb save_area for usermode FPU state is on top. The pcb now contains a pointer to the current FPU saved area, used during FPUDNA handling and context switches. There is also a facility to allow the kernel thread to use pcb save_area. Change the dreaded warnings "npxdna in kernel mode!" into the panics when FPU usage is not registered. KPI discussed with: fabient Tested by: pho, fabient Hardware provided by: Sentex Communications MFC after: 1 month
# a2622e5d	09-Jul-2009	Konstantin Belousov <kib@FreeBSD.org>	Restore the segment registers and segment base MSRs for amd64 syscall return path only when neither thread was context switched while executing syscall code nor syscall explicitely modified LDT or MSRs. Save segment registers in trap handlers before interrupts are enabled, to not allow context switches to happen before registers are saved. Use separated byte in pcb for indication of fast/full return, since pcb_flags are not synchronized with context switches. The change puts back syscall microbenchmark numbers that were slowed down after commit of the support for LDT on amd64. Reviewed by: jeff Tested (and tested, and tested ...) by: pho Approved by: re (kensmith)
# 2c66ccca	01-Apr-2009	Konstantin Belousov <kib@FreeBSD.org>	Save and restore segment registers on amd64 when entering and leaving the kernel on amd64. Fill and read segment registers for mcontext and signals. Handle traps caused by restoration of the invalidated selectors. Implement user-mode creation and manipulation of the process-specific LDT descriptors for amd64, see sysarch(2). Implement support for TSS i/o port access permission bitmap for amd64. Context-switch LDT and TSS. Do not save and restore segment registers on the context switch, that is handled by kernel enter/leave trampolines now. Remove segment restore code from the signal trampolines for freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason. Implement amd64-specific compat shims for sysarch. Linuxolator (temporary ?) switched to use gsbase for thread_area pointer. TODO: Currently, gdb is not adapted to show segment registers from struct reg. Also, no machine-depended ptrace command is added to set segment registers for debugged process. In collaboration with: pho Discussed with: peter Reviewed by: jhb Linuxolator tested by: dchagin
# c66d2b38	16-Mar-2009	Jung-uk Kim <jkim@FreeBSD.org>	Initial suspend/resume support for amd64. This code is heavily inspired by Takanori Watanabe's experimental SMP patch for i386 and large portion was shamelessly cut and pasted from Peter Wemm's AP boot code.
# e6493bbe	31-Jan-2009	David E. O'Brien <obrien@FreeBSD.org>	Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions for moving between a segment register and a 32-bit memory location. Looked at by: jhb
# 5c0c22e9	19-Jan-2009	Konstantin Belousov <kib@FreeBSD.org>	The context switch to the 32bit binary does not properly restore the fsbase value. The switch loads the fs segment register, that invalidates the value in fsbase msr, thus value in %r9 can not be considered the current value for fsbase anymore. Unconditionally reload fsbase when switching to 32bit binary. PR: 130526 MFC after: 3 weeks
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 3bd5e467	08-Sep-2008	Konstantin Belousov <kib@FreeBSD.org>	The pcb_gs32p should be per-cpu, not per-thread pointer. This is location in GDT where the segment descriptor from pcb_gs32sd is copied, and the location is in GDT local to CPU. Noted and reviewed by: peter MFC after: 1 week
# f98c3ea7	02-Sep-2008	Konstantin Belousov <kib@FreeBSD.org>	- When executing FreeBSD/amd64 binaries from FreeBSD/i386 or Linux/i386 processes, clear PCB_32BIT and PCB_GS32BIT bits [1]. - Reread the fs and gs bases from the msr unconditionally, not believing the values in pcb_fsbase and pcb_gsbase, since usermode may reload segment registers, invalidating the cache. [2]. Both problems resulted in the wrong fs base, causing wrong tls pointer be dereferenced in the usermode. Reported and tested by: Vyacheslav Bocharov <adeepv at gmail com> [1] Reported by: Bernd Walter <ticsoat cicely7 cicely de>, Artem Belevich <fbsdlist at src cx>[2] Reviewed by: peter MFC after: 3 days
# 8f4a1f3a	30-Jul-2008	Konstantin Belousov <kib@FreeBSD.org>	Bring back the save/restore of the %ds, %es, %fs and %gs registers for the 32bit images on amd64. Change the semantic of the PCB_32BIT pcb flag to request the context switch code to operate on the segment registers. Its previous meaning of saving or restoring the %gs base offset is assigned to the new PCB_GS32BIT flag. FreeBSD 32bit image activator sets the PCB_32BIT flag, while Linux 32bit emulation sets PCB_32BIT \| PCB_GS32BIT. Reviewed by: peter MFC after: 2 weeks
# f001eabf	23-Mar-2008	Peter Wemm <peter@FreeBSD.org>	First pass at (possibly futile) microoptimizing of cpu_switch. Results are mixed. Some pure context switch microbenchmarks show up to 29% improvement. Pipe based context switch microbenchmarks show up to 7% improvement. Real world tests are far less impressive as they are dominated more by actual work than switch overheads, but depending on the machine in question, workload, kernel options, phase of moon, etc, a few percent gain might be seen. Summary of changes: - don't reload MSR_[FG]SBASE registers when context switching between non-threaded userland apps. These typically cost 120 clock cycles each on an AMD cpu (less on Barcelona/Phenom). Intel cores are probably no faster on this. - The above change only helps unthreaded userland apps that tend to use the same value for gsbase. Threaded apps will get no benefit from this. - reorder things like accessing the pcb to be in memory order, to give prefetching a better chance of working. Operations are now in increasing memory address order, rather than reverse or random. - Push some lesser used code out of the main code paths. Hopefully allowing better code density in cache lines. This is probably futile. - (part 2 of previous item) Reorder code so that branches have a more realistic static branch prediction hint. Both Intel and AMD cpus default to predicting branches to lower memory addresses as being taken, and to higher memory addresses as not being taken. This is overridden by the limited dynamic branch prediction subsystem. A trip through userland might overflow this. - Futule attempt at spreading the use of the results of previous operations in new operations. Hopefully this will allow the cpus to execute in parallel better. - stop wasting 16 bytes at the top of kernel stack, below the PCB. - Never load the userland fs/gsbase registers for kthreads, but preserve curpcb->pcb_[fg]sbase as caches for the cpu. (Thanks Jeff!) Microbenchmarking this code seems to be really sensitive to things like scheduling luck, timing, cache behavior, tlb behavior, kernel options, other random code changes, etc. While it doesn't help heavy userland workloads much, it does help high context switch loads a little, and should help those that involve switching via kthreads a bit more. A special thanks to Kris for the testing and reality checks, and Jeff for tormenting me into doing this. :) This is still work-in-progress.
# ea497502	21-Aug-2007	Joseph Koshy <jkoshy@FreeBSD.org>	Assign sizes to assembly language support functions. Approved by: re (kensmith)
# 40380a6a	17-Jul-2007	Jeff Roberson <jeff@FreeBSD.org>	- Optimize the amd64 cpu_switch() TD_LOCK blocking and releasing to require fewer blocking loops. - Don't use atomic ops with 4BSD or on UP. - Only use the blocking loop if ULE is compiled in. - Use the correct memory barrier. Discussed with: attilio, jhb, ssouhlal Tested by: current@ Approved by: re
# 42ce445f	06-Jun-2007	David Xu <davidxu@FreeBSD.org>	Backout experimental adaptive-spin umtx code.
# 5d68dad3	04-Jun-2007	Jeff Roberson <jeff@FreeBSD.org>	- Add a new argument to cpu_switch. This is a pointer to a mutex that oldthread should point at before we return. - When cpu_switch() is called the td_lock pointer in the old thread may point at the blocked lock. This prevents other processors from switching into this thread while we're still switching out. Wait until we're done deactivating the vmspace before we release the thread by assigning to td_lock. - Before we can activate the new vmspace we must make sure that the new thread is not assigned to the blocked lock. It may be in the process of switching out on another cpu. Spin until the new thread is available.
# 9c5b213e	29-Mar-2007	Jung-uk Kim <jkim@FreeBSD.org>	MFP4: Linux set_thread_area syscall (aka TLS) support for amd64. Initial version was submitted by Divacky Roman and mostly rewritten by me. Tested by: emulation
# 4e32b7b3	19-Dec-2006	David Xu <davidxu@FreeBSD.org>	Add a lwpid field into per-cpu structure, the lwpid represents current running thread's id on each cpu. This allow us to add in-kernel adaptive spin for user level mutex. While spinning in user space is possible, without correct thread running state exported from kernel, it hardly can be implemented efficiently without wasting cpu cycles, however exporting thread running state unlikely will be implemented soon as it has to design and stablize interfaces. This implementation is transparent to user space, it can be disabled dynamically. With this change, mutex ping-pong program's performance is improved massively on SMP machine. performance of mysql super-smack select benchmark is increased about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems which have bunch of cpus and system-call overhead is low (athlon64, opteron, and core-2 are known to be fast), the adaptive spin does help performance. Added sysctls: kern.threads.umtx_dflt_spins if the sysctl value is non-zero, a zero umutex.m_spincount will cause the sysctl value to be used a spin cycle count. kern.threads.umtx_max_spins the sysctl sets upper limit of spin cycle count. Tested on: Athlon64 X2 3800+, Dual Xeon 5130
# 9313eb55	17-Oct-2005	David Xu <davidxu@FreeBSD.org>	Micro optimization for context switch. Eliminate code for saving gs.base and fs.base. We always update pcb.pcb_gsbase and pcb.pcb_fsbase when user wants to set them, in context switch routine, we only need to write them into registers, we never have to read them out from registers when thread is switched away. Since rdmsr is a serialization instruction, micro benchmark shows it is worthy to do. Reviewed by: peter, jhb
# 1acc225f	27-Sep-2005	Peter Wemm <peter@FreeBSD.org>	Kill pcb_rflags. It served no purpose. Reported by: bde
# 98df9e00	27-Sep-2005	Peter Wemm <peter@FreeBSD.org>	Fix a minor nit that has been bugging me for a while. Fix the obvious cases of using a 64 bit operation to zero a register. 32 bit opcodes are smaller and supposedly faster, and clear the upper 32 bits for free.
# 2c87e001	16-Aug-2004	Peter Wemm <peter@FreeBSD.org>	Sync with i386 - s/cpu_swtch/cpu_switch/
# df4fd277	16-May-2004	Peter Wemm <peter@FreeBSD.org>	Checkpoint some of what I was starting to tinker with for having some different context support for 32 vs 64 bit processes. This simply omits the save/restore of the segment selector registers for non 32 bit processes. This avoids the rdmsr/rwmsr juggling when restoring %gs clobbers the kernel msr that holds the gsbase. However, I suspect it might be better to conditionally do this at user<->kernel transition where we wouldn't need to do the juggling in the first place. Or have per-thread extended context save/restore hooks.
# 12c1418c	16-May-2004	Peter Wemm <peter@FreeBSD.org>	Kill the LAZYPMAP ifdefs. While they worked, they didn't do anything to help the AMD cpus (which have a hardware tlb flush filter). I held off to see what the 64 bit Intel cpus did, but it doesn't seem to help much there either. Oh well, store it in the Attic.
# 9a80fddc	05-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm. Approved by: core, peter
# 51621230	06-Feb-2004	Peter Wemm <peter@FreeBSD.org>	Remove the badsw* INVARIANTS checks. The events that this attempts to catch are already nicely caught by trapping the null pointer derefs. Remove no-longer-used noswitch/nothrow strings. They were referenced by the stub cpu_switch() etc functions before they were implemented. Try something a little different for the lock prefixes. Prompted by: bde (the first two items anyway)
# db527225	28-Jan-2004	Peter Wemm <peter@FreeBSD.org>	Take another shot at the invariants calls to __panic. They hadn't been updated for the regparm ABI on amd64. Context switch debug regs. Update for fpu simplification Don't needlessly reload %cr3, in case the cpu has the tlb flush filter turned off. Re-add LAZY_SWITCH stubs.
# ccaa41bc	22-Jan-2004	Peter Wemm <peter@FreeBSD.org>	Unbreak amd64: Rename calls from panic to __panic
# 0d2a2989	17-Nov-2003	Peter Wemm <peter@FreeBSD.org>	Initial landing of SMP support for FreeBSD/amd64. - This is heavily derived from John Baldwin's apic/pci cleanup on i386. - I have completely rewritten or drastically cleaned up some other parts. (in particular, bootstrap) - This is still a WIP. It seems that there are some highly bogus bioses on nVidia nForce3-150 boards. I can't stress how broken these boards are. I have a workaround in mind, but right now the Asus SK8N is broken. The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed. - Most of my testing has been with SCHED_ULE. SCHED_4BSD works. - the apic and acpi components are 'standard'. - If you have an nVidia nForce3-150 board, you are stuck with 'device atpic' in addition, because they somehow managed to forget to connect the 8254 timer to the apic, even though its in the same silicon! ARGH! This directly violates the ACPI spec.
# fcfe57d6	07-Nov-2003	Peter Wemm <peter@FreeBSD.org>	Update the graffiti.
# bf2f09ee	07-Nov-2003	Peter Wemm <peter@FreeBSD.org>	The great s/npx/fpu/gi
# c0a54ff6	14-May-2003	Peter Wemm <peter@FreeBSD.org>	Collect the nastiness for preserving the kernel MSR_GSBASE around the load_gs() calls into a single place that is less likely to go wrong. Eliminate the per-process context switching of MSR_GSBASE, because it should be constant for a single cpu. Instead, save/restore it during the loading of the new %gs selector for the new process. Approved by: re (amd64/* blanket)
# d85631c4	13-May-2003	Peter Wemm <peter@FreeBSD.org>	Add BASIC i386 binary support for the amd64 kernel. This is largely stolen from the ia64/ia32 code (indeed there was a repocopy), but I've redone the MD parts and added and fixed a few essential syscalls. It is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic) and p4. The ia64 code has not implemented signal delivery, so I had to do that. Before you say it, yes, this does need to go in a common place. But we're in a freeze at the moment and I didn't want to risk breaking ia64. I will sort this out after the freeze so that the common code is in a common place. On the AMD64 side, this required adding segment selector context switch support and some other support infrastructure. The %fs/%gs etc code is hairy because loading %gs will clobber the kernel's current MSR_GSBASE setting. The segment selectors are not used by the kernel, so they're only changed at context switch time or when changing modes. This still needs to be optimized. Approved by: re (amd64/* blanket)
# bf1e8974	11-May-2003	Peter Wemm <peter@FreeBSD.org>	Give a %fs and %gs to userland. Use swapgs to obtain the kernel %GS.base value on entry and exit. This isn't as easy as it sounds because when we recursively trap or interrupt, we have to avoid duplicating the swapgs instruction or we end up back with the userland %gs. I implemented this by testing TF_CS to see if we're coming from supervisor mode already, and check for returning to supervisor. To avoid a race with interrupts in the brief period after beginning executing the handler and before the swapgs, convert all trap gates to interrupt gates, and reenable interrupts immediately after the swapgs. I am not happy with this. There are other possible ways to do this that should be investigated. (eg: storing the GS.base MSR value in the trapframe) Add some sysarch functions to let the userland code get to this. Approved by: re (blanket amd64/*)
# afa88623	30-Apr-2003	Peter Wemm <peter@FreeBSD.org>	Commit MD parts of a loosely functional AMD64 port. This is based on a heavily stripped down FreeBSD/i386 (brutally stripped down actually) to attempt to get a stable base to start from. There is a lot missing still. Worth noting: - The kernel runs at 1GB in order to cheat with the pmap code. pmap uses a variation of the PAE code in order to avoid having to worry about 4 levels of page tables yet. - It boots in 64 bit "long mode" with a tiny trampoline embedded in the i386 loader. This simplifies locore.s greatly. - There are still quite a few fragments of i386-specific code that have not been translated yet, and some that I cheated and wrote dumb C versions of (bcopy etc). - It has both int 0x80 for syscalls (but using registers for argument passing, as is native on the amd64 ABI), and the 'syscall' instruction for syscalls. int 0x80 preserves all registers, 'syscall' does not. - I have tried to minimize looking at the NetBSD code, except in a couple of places (eg: to find which register they use to replace the trashed %rcx register in the syscall instruction). As a result, there is not a lot of similarity. I did look at NetBSD a few times while debugging to get some ideas about what I might have done wrong in my first attempt.
# c81e825f	05-Apr-2003	Peter Wemm <peter@FreeBSD.org>	Unbreak the !LAZY_SWITCH case. I #ifdef'ed too much when I added the ifdefs prior to commit and killed the same-address-space test. Submitted by: bde
# cc66ebe2	02-Apr-2003	Peter Wemm <peter@FreeBSD.org>	Commit a partial lazy thread switch mechanism for i386. it isn't as lazy as it could be and can do with some more cleanup. Currently its under options LAZY_SWITCH. What this does is avoid %cr3 reloads for short context switches that do not involve another user process. ie: we can take an interrupt, switch to a kthread and return to the user without explicitly flushing the tlb. However, this isn't as exciting as it could be, the interrupt overhead is still high and too much blocks on Giant still. There are some debug sysctls, for stats and for an on/off switch. The main problem with doing this has been "what if the process that you're running on exits while we're borrowing its address space?" - in this case we use an IPI to give it a kick when we're about to reclaim the pmap. Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a few more things and get some more feedback before turning it on by default. This is NOT a replacement for Bosko's lazy interrupt stuff. This was more meant for the kthread case, while his was for interrupts. Mine helps a little for interrupts, but his helps a lot more. The stats are enabled with options SWTCH_OPTIM_STATS - this has been a pseudo-option for years, I just added a bunch of stuff to it. One non-trivial change was to select a new thread before calling cpu_switch() in the first place. This allows us to catch the silly case of doing a cpu_switch() to the current process. This happens uncomfortably often. This simplifies a bit of the asm code in cpu_switch (no longer have to call choosethread() in the middle). This has been implemented on i386 and (thanks to jake) sparc64. The others will come soon. This is actually seperate to the lazy switch stuff. Glanced at by: jake, jhb
# 2fbe601a	22-Jan-2003	Peter Wemm <peter@FreeBSD.org>	Now that TPR isn't bogusly raised at boot, there is no need to clear it at context switch.
# e344afe7	20-Jul-2002	Peter Wemm <peter@FreeBSD.org>	Move SWTCH_OPTIM_STATS related code out of cpufunc.h. (This sort of stat gathering is not an x86 cpu feature)
# 33d7ad1a	12-Jul-2002	John Baldwin <jhb@FreeBSD.org>	Set the thread state of the newly chosen to run thread to TDS_RUNNING in choosethread() in MI C code instead of doing it in in assembly in all the various cpu_switch() functions. This fixes problems on ia64 and sparc64. Reviewed by: julian, peter, benno Tested on: i386, alpha, sparc64
# e602ba25	29-Jun-2002	Julian Elischer <julian@FreeBSD.org>	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
# d74ac681	26-Mar-2002	Matthew Dillon <dillon@FreeBSD.org>	Compromise for critical()/cpu_critical() recommit. Cleanup the interrupt disablement assumptions in kern_fork.c by adding another API call, cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it from MI to MD. Temporarily move cpu_critical() from <arch>/include/cpufunc.h to <arch>/<arch>/critical.c (stage-2 will clean this up). Implement interrupt deferral for i386 that allows interrupts to remain enabled inside critical sections. This also fixes an IPI interlock bug, and requires uses of icu_lock to be enclosed in a true interrupt disablement. This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized, and will move cpu_critical() into its own header file(s) + other things. This commit may break non-i386 architectures in trivial ways. This should be temporary. Reviewed by: core Approved by: core
# 181df8c9	26-Feb-2002	Matthew Dillon <dillon@FreeBSD.org>	revert last commit temporarily due to whining on the lists.
# f96ad4c2	26-Feb-2002	Matthew Dillon <dillon@FreeBSD.org>	STAGE-1 of 3 commit - allow (but do not require) interrupts to remain enabled in critical sections and streamline critical_enter() and critical_exit(). This commit allows an architecture to leave interrupts enabled inside critical sections if it so wishes. Architectures that do not wish to do this are not effected by this change. This commit implements the feature for the I386 architecture and provides a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For now you can turn the sysctl on and off at any time in order to test the architectural changes or track down bugs. This commit is just the first stage. Some areas of the code, specifically the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will be cleaned up in the STAGE-2 commit when the critical_() functions are moved entirely into MD files. The following changes have been made: critical_enter() and critical_exit() for I386 now simply increment and decrement curthread->td_critnest. They no longer disable hard interrupts. When critical_exit() decrements the counter to 0 it effectively calls a routine to deal with whatever interrupts were deferred during the time the code was operating in a critical section. Other architectures are unaffected. * fork_exit() has been conditionalized to remove MD assumptions for the new code. Old code will still use the old MD assumptions in regards to hard interrupt disablement. In STAGE-2 this will be turned into a subroutine call into MD code rather then hardcoded in MI code. The new code places the burden of entering the critical section in the trampoline code where it belongs. * I386: interrupts are now enabled while we are in a critical section. The interrupt vector code has been adjusted to deal with the fact. If it detects that we are in a critical section it currently defers the interrupt by adding the appropriate bit to an interrupt mask. * In order to accomplish the deferral, icu_lock is required. This is i386-specific. Thus icu_lock can only be obtained by mainline i386 code while interrupts are hard disabled. This change has been made. * Because interrupts may or may not be hard disabled during a context switch, cpu_switch() can no longer simply assume that PSL_I will be in a consistent state. Therefore, it now saves and restores eflags. * FAST INTERRUPT PROVISION. Fast interrupts are currently deferred. The intention is to eventually allow them to operate either while we are in a critical section or, if we are able to restrict the use of sched_lock, while we are not holding the sched_lock. * ICU and APIC vector assembly for I386 cleaned up. The ICU code has been cleaned up to match the APIC code in regards to format and macro availability. Additionally, the code has been adjusted to deal with deferred interrupts. * Deferred interrupts use a per-cpu boolean int_pending, and masks ipending, spending, and fpending. Being per-cpu variables it is not currently necessary to lock; bus cycles modifying them. Note that the same mechanism will enable preemption to be incorporated as a true software interrupt without having to further hack up the critical nesting code. * Note: the old critical_enter() code in kern/kern_switch.c is currently #ifdef to be compatible with both the old and new methodology. In STAGE-2 it will be moved entirely to MD code. Performance issues: One of the purposes of this commit is to enhance critical section performance, specifically to greatly reduce bus overhead to allow the critical section code to be used to protect per-cpu caches. These caches, such as Jeff's slab allocator work, can potentially operate very quickly making the effective savings of the new critical section code's performance very significant. The second purpose of this commit is to allow architectures to enable certain interrupts while in a critical section. Specifically, the intention is to eventually allow certain FAST interrupts to operate rather then defer. The third purpose of this commit is to begin to clean up the critical_enter()/critical_exit()/cpu_critical_enter()/ cpu_critical_exit() API which currently has serious cross pollution in MI code (in fork_exit() and ast() for example). The fourth purpose of this commit is to provide a framework that allows kernel-preempting software interrupts to be implemented cleanly. This is currently used for two forward interrupts in I386. Other architectures will have the choice of using this infrastructure or building the functionality directly into critical_enter()/ critical_exit(). Finally, this commit is designed to greatly improve the flexibility of various architectures to manage critical section handling, software interrupts, preemption, and other highly integrated architecture-specific details.
# 620080d0	07-Feb-2002	Peter Wemm <peter@FreeBSD.org>	Attempt to patch up some style bugs introduced in the previous commit
# 079b7bad	07-Feb-2002	Julian Elischer <julian@FreeBSD.org>	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
# e744f309	17-Jan-2002	Bruce Evans <bde@FreeBSD.org>	Changed the type of pcb_flags from u_char to u_int and adjusted things. This removes the only atomic operation on a char type in the entire kernel.
# 0bbc8826	11-Dec-2001	John Baldwin <jhb@FreeBSD.org>	Overhaul the per-CPU support a bit: - The MI portions of struct globaldata have been consolidated into a MI struct pcpu. The MD per-CPU data are specified via a macro defined in machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the interface would be cleaner (PCPU_GET(my_md_field) vs. PCPU_GET(md.md_my_md_field)). - All references to globaldata are changed to pcpu instead. In a UP kernel, this data was stored as global variables which is where the original name came from. In an SMP world this data is per-CPU and ideally private to each CPU outside of the context of debuggers. This also included combining machine/globaldata.h and machine/globals.h into machine/pcpu.h. - The pointer to the thread using the FPU on i386 was renamed from npxthread to fpcurthread to be identical with other architectures. - Make the show pcpu ddb command MI with a MD callout to display MD fields. - The globaldata_register() function was renamed to pcpu_init() and now init's MI fields of a struct pcpu in addition to registering it with the internal array and list. - A pcpu_destroy() function was added to remove a struct pcpu from the internal array and list. Tested on: alpha, i386 Reviewed by: peter, jake
# d01404c8	29-Oct-2001	John Baldwin <jhb@FreeBSD.org>	Fix a typo in comment and #ifdef fixes: GRAP_PRIO -> GRAB_PRIO so that x86 SMP kernels actually boot again to single user mode. Pointy hat to: jhb Noticed by: jlemon
# 9869fa1d	28-Oct-2001	John Baldwin <jhb@FreeBSD.org>	- More whitespace and comment cleanups. - Remove unused sw1a label. A breakpoint can be set in choosethread() for the same effect. Reviewed by: bde Submitted by: bde (partly)
# e0e30307	25-Oct-2001	John Baldwin <jhb@FreeBSD.org>	Currently no code does a CROSSJUMP() to sw1a, so we don't need a CROSSJUMPTARGET() for it. Submitted by: bde
# 02c41f11	25-Oct-2001	John Baldwin <jhb@FreeBSD.org>	Use %ecx instead of %ebx for the scratch register while updating %dr7 since %ecx isn't a call safe register and thus we don't have to save and restore it. Submitted by: bde
# 7df8a724	25-Oct-2001	John Baldwin <jhb@FreeBSD.org>	- Fix typo in comment from previous revision. - Fix a bug in the LDT changes where the wrong argument was passed to set_user_ldt() from cpu_switch(). The bug was passing a pointer to the ldt, but set_user_ldt() takes a pointer to the process' mdproc structure. Submitted by: bde
# 163fd6fb	25-Oct-2001	John Baldwin <jhb@FreeBSD.org>	Whitespace, comment, and string fixes. Submitted by: bde (mostly)
# 24db0459	24-Oct-2001	John Baldwin <jhb@FreeBSD.org>	Split the per-process Local Descriptor Table out of the PCB and into struct mdproc. Submitted by: Andrew R. Reiter <arr@watson.org> Silence on: -current
# 2f01a0c0	18-Sep-2001	Peter Wemm <peter@FreeBSD.org>	Fix a mistake I made with the pcb movement relative to the stack in the KSE patch. We need to leave the 16 bytes here for enabling the trapframe to be converted to a vm86trapframe if we're switching to a vm86 context.
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 3ad234d4	18-Jul-2001	Brian S. Dean <bsd@FreeBSD.org>	swtch.s: During context save, use the correct bit mask for clearing the non-reserved bits of dr7. During context restore, load dr7 in such a way as to not disturb reserved bits. machdep.c: Don't explicitly disallow the setting of the reserved bits in dr7 since we now keep from setting them when we load dr7 from the PCB. This allows one to write back the dr7 value obtained from the system without triggering an EINVAL (one of the reserved bits always seems to be set after taking a trace trap). MFC after: 7 days
# c2b095ab	20-May-2001	Bruce Evans <bde@FreeBSD.org>	Use a critical region to protect saving of the npx state in savectx(). Not doing this was fairly harmless because savectx() is only called for panic dumps and the bug could at worse reset the state. savectx() is still missing saving of (volatile) debug registers, and still isn't called for core dumps.
# 8bd57f8f	15-May-2001	John Baldwin <jhb@FreeBSD.org>	Remove unneeded includes of sys/ipl.h and machine/ipl.h.
# 02318dac	24-Feb-2001	Jake Burkholder <jake@FreeBSD.org>	Remove the leading underscore from all symbols defined in x86 asm and used in C or vice versa. The elf compiler uses the same names for both. Remove asnames.h with great prejudice; it has served its purpose. Note that this does not affect the ability to generate an aout kernel due to gcc's -mno-underscores option. moral support from: peter, jhb
# f1532aad	22-Feb-2001	Peter Wemm <peter@FreeBSD.org>	Activate USER_LDT by default. The new thread libraries are going to depend on this. The linux ABI emulator tries to use it for some linux binaries too. VM86 had a bigger cost than this and it was made default a while ago. Reviewed by: jhb, imp
# 5813dc03	19-Feb-2001	John Baldwin <jhb@FreeBSD.org>	- Don't call clear_resched() in userret(), instead, clear the resched flag in mi_switch() just before calling cpu_switch() so that the first switch after a resched request will satisfy the request. - While I'm at it, move a few things into mi_switch() and out of cpu_switch(), specifically set the p_oncpu and p_lastcpu members of proc in mi_switch(), and handle the sched_lock state change across a context switch in mi_switch(). - Since cpu_switch() no longer handles the sched_lock state change, we have to setup an initial state for sched_lock in fork_exit() before we release it.
# d5a08a60	11-Feb-2001	Jake Burkholder <jake@FreeBSD.org>	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.
# d888fc4e	11-Feb-2001	Mark Murray <markm@FreeBSD.org>	RIP <machine/lock.h>. Some things needed bits of <i386/include/lock.h> - cy.c now has its own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK() has been moved to <i386/include/apic.h> (AKA <machine/apic.h>). Reviewed by: jhb
# 142ba5f3	09-Feb-2001	John Baldwin <jhb@FreeBSD.org>	- Make astpending and need_resched process attributes rather than CPU attributes. This is needed for AST's to be properly posted in a preemptive kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and PS_NEEDRESCHED. They are still accesssed by their old macros: aston(), astoff(), etc. For completeness, an astpending() macro has been added to check for a pending AST, and clear_resched() has been added to clear need_resched(). - Rename syscall2() on the x86 back to syscall() to be consistent with other architectures.
# 7dd2de5b	19-Jan-2001	Jake Burkholder <jake@FreeBSD.org>	Rename the ASSYM MTX_RECURSE to MTX_RECURSECNT in order to not conflict with the flag of the same name.
# 558226ea	19-Jan-2001	Peter Wemm <peter@FreeBSD.org>	Use #ifdef DEV_NPX from opt_npx.h instead of #if NNPX > 0 from npx.h
# 41ed17bf	06-Jan-2001	Jake Burkholder <jake@FreeBSD.org>	Use %fs to access per-cpu variables in uni-processor kernels the same as multi-processor kernels. The old way made it difficult for kernel modules to be portable between uni-processor and multi-processor kernels. It is no longer necessary to jump through hoops. - always load %fs with the private segment on entry to the kernel - change the type of the self referntial pointer from struct privatespace to struct globaldata - make the globaldata symbol have value 0 in all cases, so the symbols in globals.s are always offsets, not aliases for fields in globaldata - define the globaldata space used for uniprocessor kernels in C, rather than assembler - change the assmebly language accessors to use %fs, add a macro PCPU_ADDR(member, reg), which loads the register reg with the address of the per-cpu variable member
# 7d8e3aa0	13-Dec-2000	Jake Burkholder <jake@FreeBSD.org>	Use _lapic+offset to access the local apic from assembly language files, rather than the symbols in globals.s. The offsets are generated by genassym.
# 6d43764a	13-Dec-2000	Jake Burkholder <jake@FreeBSD.org>	Introduce a new potientially cleaner interface for accessing per-cpu variables from i386 assembly language. The syntax is PCPU(member) where member is the capitalized name of the per-cpu variable, without the gd_ prefix. Example: movl %eax,PCPU(CURPROC). The capitalization is due to using the offsets generated by genassym rather than the symbols provided by linking with globals.o. asmacros.h is the wrong place for this but it seemed as good a place as any for now. The old implementation in asnames.h has not been removed because it is still used to de-mangle the symbols used by the C variables for the UP case.
# 1b00f920	08-Dec-2000	Jake Burkholder <jake@FreeBSD.org>	Revert the previous change I made to cpu_switch. It doesn't help as much as I thought it would and according to bde was a pessimization.
# 1306962a	02-Dec-2000	Jake Burkholder <jake@FreeBSD.org>	Change cpu_switch to explicitly popl the callers program counter and pushl that of the new process, rather than doing a movl (%esp) and assuming that the stack has been setup right. This make the initial stack setup slightly more sane, and will make it easier to stick an interrupted process onto the run queue without its knowing.
# 835a748f	17-Nov-2000	John Baldwin <jhb@FreeBSD.org>	- Change extra sanity checks in cpu_switch() to be conditional on INVARIANTS instead of DIAGNOSTIC. - Remove the p_wchan check as it no longer applies since a process may be switched out during CURSIG() within msleep() or mawait(). - Remove an extra sanity check only needed during the early SMPng work.
# ac5f943c	13-Oct-2000	Peter Wemm <peter@FreeBSD.org>	savectx() is now used exclusively by the crash dump system. Move the i386 specific gunk (copy %cr3 to the pcb) from the MI dumpsys() to the MD savectx().
# e4a85a9b	08-Oct-2000	Bruce Evans <bde@FreeBSD.org>	Unremoved used include of <machine/ipl.h>. Removing it in rev.1.95 significantly pessimized syscalls by arranging to do null rescheduling on return from every syscall. (AST_RESCHED was not defined, and the mask ~AST_RESCHED gets replaced by the useless mask ~0. This bug has been fixed before, in rev.1.92.)
# 6c567274	05-Oct-2000	John Baldwin <jhb@FreeBSD.org>	- Change fast interrupts on x86 to push a full interrupt frame and to return through doreti to handle ast's. This is necessary for the clock interrupts to work properly. - Change the clock interrupts on the x86 to be fast instead of threaded. This is needed because both hardclock() and statclock() need to run in the context of the current process, not in a separate thread context. - Kill the prevproc hack as it is no longer needed. - We really need Giant when we call psignal(), but we don't want to block during the clock interrupt. Instead, use two p_flag's in the proc struct to mark the current process as having a pending SIGVTALRM or a SIGPROF and let them be delivered during ast() when hardclock() has finished running. - Remove CLKF_BASEPRI, which was #ifdef'd out on the x86 anyways. It was broken on the x86 if it was turned on since cpl is gone. It's only use was to bogusly run softclock() directly during hardclock() rather than scheduling an SWI. - Remove the COM_LOCK simplelock and replace it with a clock_lock spin mutex. Since the spin mutex already handles disabling/restoring interrupts appropriately, this also lets us axe all the *_intr() fu. - Back out the hacks in the APIC_IO x86 cpu_initclocks() code to use temporary fast interrupts for the APIC trial. - Add two new process flags P_ALRMPEND and P_PROFPEND to mark the pending signals in hardclock() that are to be delivered in ast(). Submitted by: jakeb (making statclock safe in a fast interrupt) Submitted by: cp (concept of delaying signals until ast())
# 1931cf94	05-Oct-2000	John Baldwin <jhb@FreeBSD.org>	- Heavyweight interrupt threads on the alpha for device I/O interrupts. - Make softinterrupts (SWI's) almost completely MI, and divorce them completely from the x86 hardware interrupt code. - The ihandlers array is now gone. Instead, there is a MI shandlers array that just contains SWI handlers. - Most of the former machine/ipl.h files have moved to a new sys/ipl.h. - Stub out all the spl*() functions on all architectures. Submitted by: dfr
# 72b535ef	21-Sep-2000	Mike Smith <msmith@FreeBSD.org>	Implement halt-on-idle in the !SMP case, which should significantly reduce power consumption on most systems.
# 0384fff8	06-Sep-2000	Jason Evans <jasone@FreeBSD.org>	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
# 93367624	10-May-2000	Peter Wemm <peter@FreeBSD.org>	Move <machine/ipl.h> outside #ifdef SMP because it supplies AST_RESCHED. Without this, it shows up as an undefined symbol in /kernel. (!) (This looks very freaky when doing a nm /kernel!)
# bd5caafc	28-Mar-2000	Matthew Dillon <dillon@FreeBSD.org>	The SMP cleanup commit broke need_resched, this fixes that and also removed unncessary MPLOCKED and 'lock' prefixes from the interrupt nesting level, since (A) the MP lock is held at the time, and (B) since the neting level is restored prior to return any interrupted code will see a consistent value.
# 36e9f877	28-Mar-2000	Matthew Dillon <dillon@FreeBSD.org>	Commit major SMP cleanups and move the BGL (big giant lock) in the syscall path inward. A system call may select whether it needs the MP lock or not (the default being that it does need it). A great deal of conditional SMP code for various deadended experiments has been removed. 'cil' and 'cml' have been removed entirely, and the locking around the cpl has been removed. The conditional separately-locked fast-interrupt code has been removed, meaning that interrupts must hold the CPL now (but they pretty much had to anyway). Another reason for doing this is that the original separate-lock for interrupts just doesn't apply to the interrupt thread mechanism being contemplated. Modifications to the cpl may now ONLY occur while holding the MP lock. For example, if an otherwise MP safe syscall needs to mess with the cpl, it must hold the MP lock for the duration and must (as usual) save/restore the cpl in a nested fashion. This is precursor work for the real meat coming later: avoiding having to hold the MP lock for common syscalls and I/O's and interrupt threads. It is expected that the spl mechanisms and new interrupt threading mechanisms will be able to run in tandem, allowing a slow piecemeal transition to occur. This patch should result in a moderate performance improvement due to the considerable amount of code that has been removed from the critical path, especially the simplification of the spl*() calls. The real performance gains will come later. Approved by: jkh Reviewed by: current, bde (exception.s) Some work taken from: luoqi's patch
# f8515dd8	02-Jan-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Move the "sti" instruction to right before the "hlt" to close a tiny race condition. Obtained from: bde and/or obrien
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# 28f31ccf	18-Aug-1999	Peter Wemm <peter@FreeBSD.org>	Use the MI process selection. We use a quick routine to decide whether to get the mplock and enter the kernel to run a process in the SMP case.
# eec2e836	10-Jul-1999	Bruce Evans <bde@FreeBSD.org>	Go back to the old (icu.s rev.1.7 1993) way of keeping the AST-pending bit separate from ipending, since this is simpler and/or necessary for SMP and may even be better for UP. Reviewed by: alc, luoqi, tegge
# ab001a72	08-Jul-1999	Jonathan Lemon <jlemon@FreeBSD.org>	Implement support for hardware debug registers on the i386. Submitted by: Brian Dean <brdean@unx.sas.com>
# 789fb7cc	03-Jul-1999	Alan Cox <alc@FreeBSD.org>	An SMP-specific change: Add the lock prefix to RMW operations on ipending.
# eb9d435a	01-Jun-1999	Jonathan Lemon <jlemon@FreeBSD.org>	Unifdef VM86. Reviewed by: silence on on -current
# 0f0fe5a4	12-May-1999	Luoqi Chen <luoqi@FreeBSD.org>	Unbreak VESA on SMP.
# ea2b3e3d	06-May-1999	Bruce Evans <bde@FreeBSD.org>	Fixed profiling of elf kernels. Made high resolution profiling compile for elf kernels (it is broken for all kernels due to lack of egcs support). Renaming of many assembler labels is avoided by declaring by declaring the labels that need to be visible to gprof as having type "function" and depending on the elf version of gprof being zealous about discarding the others. A few type declarations are still missing, mainly for SMP. PR: 9413 Submitted by: Assar Westerlund <assar@sics.se> (initial parts)
# 5206bca1	27-Apr-1999	Luoqi Chen <luoqi@FreeBSD.org>	Enable vmspace sharing on SMP. Major changes are, - %fs register is added to trapframe and saved/restored upon kernel entry/exit. - Per-cpu pages are no longer mapped at the same virtual address. - Each cpu now has a separate gdt selector table. A new segment selector is added to point to per-cpu pages, per-cpu global variables are now accessed through this new selector (%fs). The selectors in gdt table are rearranged for cache line optimization. - fask_vfork is now on as default for both UP and SMP. - Some aio code cleanup. Reviewed by: Alan Cox <alc@cs.rice.edu> John Dyson <dyson@iquest.net> Julian Elischer <julian@whistel.com> Bruce Evans <bde@zeta.org.au> David Greenman <dg@root.com>
# 087e80a9	02-Apr-1999	Alan Cox <alc@FreeBSD.org>	Put in place the infrastructure for improved UP and SMP TLB management. In particular, replace the unused field pmap::pm_flag by pmap::pm_active, which is a bit mask representing which processors have the pmap activated. (Thus, it is a simple Boolean on UPs.) Also, eliminate an unnecessary memory reference from cpu_switch() in swtch.s. Assisted by: John S. Dyson <dyson@iquest.net> Tested by: Luoqi Chen <luoqi@watermarkgroup.com>, Poul-Henning Kamp <phk@critter.freebsd.dk>
# 2618393e	20-Mar-1999	Alan Cox <alc@FreeBSD.org>	Eliminate a pointless TLB flush from the SMP idle loop. Submitted by: Luoqi Chen <luoqi@watermarkgroup.com> Reviewed by: "John S. Dyson" <toor@dyson.iquest.net>
# 6d2b6a08	17-Mar-1999	Jonathan Lemon <jlemon@FreeBSD.org>	Change btrl/btsl to cmpl/movl, since each cpu now has their own copy of private_tss, and there's no need to use a bit array. Also fixes the problem of using `je' after btrl, since cmpl sets ZF. Noticed by: Luoqi, on -current
# aa839b4b	28-Jul-1998	Bruce Evans <bde@FreeBSD.org>	Micro-optimized and cleaned up the clearing of switchtime in idle(). Cleaned up the conditionals in the disgusting SMP ifdef in idle().
# e796e00d	28-May-1998	Poul-Henning Kamp <phk@FreeBSD.org>	Some cleanups related to timecounters and weird ifdefs in <sys/time.h>. Clean up (or if antipodic: down) some of the msgbuf stuff. Use an inline function rather than a macro for timecounter delta. Maintain process "on-cpu" time as 64 bits of microseconds to avoid needless second rollover overhead. Avoid calling microuptime the second time in mi_switch() if we do not pass through _idle in cpu_switch() This should reduce our context-switch overhead a bit, in particular on pre-P5 and SMP systems. WARNING: Programs which muck about with struct proc in userland will have to be fixed. Reviewed, but found imperfect by: bde
# daa2c78f	19-May-1998	Peter Dufault <dufault@FreeBSD.org>	Remove option for SCHED_FIFO. With this optional, SCHED_FIFO is the same as RTPRIO_IDLE when it falls through to the default.
# cfa5644b	12-May-1998	John Dyson <dyson@FreeBSD.org>	Some temporary fixes to SMP to make it more scheduling and signal friendly. This is a result of discussions on the mailing lists. Kudos to those who have found the issue and created work-arounds. I have chosen Tor's fix for now, before we can all work the issue more completely. Submitted by: Tor Egge
# 74164362	06-Apr-1998	Peter Wemm <peter@FreeBSD.org>	_curpcb is always defined in globals.s instead of here in #ifdefs
# 8a6472b7	28-Mar-1998	Peter Dufault <dufault@FreeBSD.org>	Finish _POSIX_PRIORITY_SCHEDULING. Needs P1003_1B and _KPOSIX_PRIORITY_SCHEDULING options to work. Changes: Change all "posix4" to "p1003_1b". Misnamed files are left as "posix4" until I'm told if I can simply delete them and add new ones; Add _POSIX_PRIORITY_SCHEDULING system calls for FreeBSD and Linux; Add man pages for _POSIX_PRIORITY_SCHEDULING system calls; Add options to LINT; Minor fixes to P1003_1B code during testing.
# f3df61a1	04-Mar-1998	Peter Dufault <dufault@FreeBSD.org>	Reviewed by: msmith, bde long ago Fix for RTPRIO scheduler to eliminate invalid context switches.
# 0b08f5f7	05-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Back out DIAGNOSTIC changes.
# 47cfdb16	04-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Turn DIAGNOSTIC into a new-style option.
# 5c623cb6	14-Dec-1997	Tor Egge <tegge@FreeBSD.org>	Add support for low resolution SMP kernel profiling. - A nonprofiling version of s_lock (called s_lock_np) is used by mcount. - When profiling is active, more registers are clobbered in seemingly simple assembly routines. This means that some callers needed to save/restore extra registers. - The stack pointer must have space for a 'fake' return address in idle, to avoid stack underflow.
# 82566551	13-Dec-1997	John Dyson <dyson@FreeBSD.org>	After one of my analysis passes to evaluate methods for SMP TLB mgmt, I noticed some major enhancements available for UP situations. The number of UP TLB flushes is decreased much more than significantly with these changes. Since a TLB flush appears to cost minimally approx 80 cycles, this is a "nice" enhancement, equiv to eliminating between 40 and 160 instructions per TLB flush. Changes include making sure that kernel threads all use the same PTD, and eliminate unneeded PTD switches at context switch time.
# 98823b23	10-Oct-1997	Peter Wemm <peter@FreeBSD.org>	Convert the VM86 option from a global option to an option only depended on by the files that use it. Changing the VM86 option now only causes a recompile of a dozen files or so rather than the entire kernel.
# dfd5aef3	21-Sep-1997	Peter Wemm <peter@FreeBSD.org>	Implement the parts needed for VM86 under SMP.
# 20233f27	07-Sep-1997	Steve Passe <fsmp@FreeBSD.org>	General cleanup of the lock pushdown code. They are grouped and enabled from machine/smptests.h: #define PUSHDOWN_LEVEL_1 #define PUSHDOWN_LEVEL_2 #define PUSHDOWN_LEVEL_3 #define PUSHDOWN_LEVEL_4_NOT
# bb36094c	05-Sep-1997	Peter Wemm <peter@FreeBSD.org>	Argh, what was I thinking?? Don't (yet) halt the CPU in the idle loop while waiting for an interrupt (rather than spinning on the runqueue status bits), since the other cpu can put stuff in there and the sleeping cpu may not get an interrupt for a while. When we have a reschedule IPI, this can come back. Pointed out by: fsmp
# 9a3b3e8b	26-Aug-1997	Peter Wemm <peter@FreeBSD.org>	Clean up the SMP AP bootstrap and eliminate the wretched idle procs. - We now have enough per-cpu idle context, the real idle loop has been revived (cpu's halt now with nothing to do). - Some preliminary support for running some operations outside the global lock (eg: zeroing "free but not yet zeroed pages") is present but appears to cause problems. Off by default. - the smp_active sysctl now behaves differently. It's merely a 'true/false' option. Setting smp_active to zero causes the AP's to halt in the idle loop and stop scheduling processes. - bootstrap is a lot safer. Instead of sharing a statically compiled in stack a number of times (which has caused lots of problems) and then abandoning it, we use the idle context to boot the AP's directly. This should help >2 cpu support since the bootlock stuff was in doubt. - print physical apic id in traps.. helps identify private pages getting out of sync. (You don't want to know how much hair I tore out with this!) More cleanup to follow, this is more of a checkpoint than a 'finished' thing.
# 48a09cf2	08-Aug-1997	John Dyson <dyson@FreeBSD.org>	VM86 kernel support. Work done by BSDI, Jonathan Lemon <jlemon@americantv.com>, Mike Smith <msmith@gsoft.com.au>, Sean Eric Fagan <sef@kithrup.com>, and probably alot of others. Submitted by: Jnathan Lemon <jlemon@americantv.com>
# 570dbb53	04-Aug-1997	Steve Passe <fsmp@FreeBSD.org>	Eliminate frequent silo overflows by restoring the TEST_LOPRIO code. This code was eliminated when the PEND_INTS algorithm was added. But it was discovered that PEND_INTS only worsen latency for FAST_INTR() routines, which can't be marked pending. Noticed & debugged by: dave adkins <adkin003@gold.tc.umn.edu>
# da9f0182	30-Jul-1997	Steve Passe <fsmp@FreeBSD.org>	Converted the TEST_LOPRIO code to default. Created mplock functions that save/restore NO registers. Minor cleanup.
# e31521c3	20-Jul-1997	Bruce Evans <bde@FreeBSD.org>	Removed unused #includes.
# 665bb8fa	14-Jul-1997	Steve Passe <fsmp@FreeBSD.org>	Tighten up asm code for TEST_PRIO and other misc. things. Use some new defines in place of "magic numbers".
# 057b294d	30-Jun-1997	Bruce Evans <bde@FreeBSD.org>	Un-inline a call to spl0(). It is not time critical, and was only inline because there was no non-inline spl0() to call. Don't frob intr_nesting_level in idle() or cpu_switch(). Interrupts are mostly disabled then, so the frobbing had little effect.
# b3196e4b	22-Jun-1997	Peter Wemm <peter@FreeBSD.org>	Preliminary support for per-cpu data pages. This eliminates a lot of #ifdef SMP type code. Things like _curproc reside in a data page that is unique on each cpu, eliminating the expensive macros like: #define curproc (SMPcurproc[cpunumber()]) There are some unresolved bootstrap and address space sharing issues at present, but Steve is waiting on this for other work. There is still some strictly temporary code present that isn't exactly pretty. This is part of a larger change that has run into some bumps, this part is standalone so it should be safe. The temporary code goes away when the full idle cpu support is finished. Reviewed by: fsmp, dyson
# 7b3c8424	06-Jun-1997	Bruce Evans <bde@FreeBSD.org>	Preserve %fs and %gs across context switches. This has a relatively low cost since it is only done in cpu_switch(), not for every exception. The extra state is kept in the pcb, and handled much like the npx state, with similar deficiencies (the state is not preserved across signal handlers, and error handling loses state).
# 5400ed3b	31-May-1997	Peter Wemm <peter@FreeBSD.org>	Include file updates.. <machine/spl.h> -> <machine/ipl.h>, add <machine/ipl.h> to those files that were depending on getting SWI_* implicitly via <machine/cpufunc.h>
# 288e2230	28-May-1997	Peter Wemm <peter@FreeBSD.org>	remove no longer needed opt_smp.h includes
# 7b2a188c	28-Apr-1997	Steve Passe <fsmp@FreeBSD.org>	cleaned out an old FIXME.
# 477a642c	26-Apr-1997	Peter Wemm <peter@FreeBSD.org>	Man the liferafts! Here comes the long awaited SMP -> -current merge! There are various options documented in i386/conf/LINT, there is more to come over the next few days. The kernel should run pretty much "as before" without the options to activate SMP mode. There are a handful of known "loose ends" that need to be fixed, but have been put off since the SMP kernel is in a moderately good condition at the moment. This commit is the result of the tinkering and testing over the last 14 months by many people. A special thanks to Steve Passe for implementing the APIC code!
# 9081eec1	22-Apr-1997	John Polstra <jdp@FreeBSD.org>	Make the necessary changes so that an ELF kernel can be built. I have successfully built, booted, and run a number of different ELF kernel configurations, including GENERIC. LINT also builds and links cleanly, though I have not tried to boot it. The impact on developers is virtually nil, except for two things. All linker sets that might possibly be present in the kernel must be listed in "sys/i386/i386/setdefs.h". And all C symbols that are also referenced from assembly language code must be listed in "sys/i386/include/asnames.h". It so happens that failure to do these things will have no impact on the a.out kernel. But it will break the build of the ELF kernel. The ELF bootloader works, but it is not ready to commit quite yet.
# a688c7b0	20-Apr-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Fix up the "hlt vector" change I made. Reviewed by: bde, bde, bde
# 3845d118	14-Apr-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Forget all about APM. Instead of "hlt" call through a vector which APM can then fiddle with. Default for the vector is to "htl; ret"
# a2a1c95c	07-Apr-1997	Peter Wemm <peter@FreeBSD.org>	The biggie: Get rid of the UPAGES from the top of the per-process address space. (!) Have each process use the kernel stack and pcb in the kvm space. Since the stacks are at a different address, we cannot copy the stack at fork() and allow the child to return up through the function call tree to return to user mode - create a new execution context and have the new process begin executing from cpu_switch() and go to user mode directly. In theory this should speed up fork a bit. Context switch the tss_esp0 pointer in the common tss. This is a lot simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer to each process's tss since the esp0 pointer is a 32 bit pointer, and the sd_base setting is split into three different bit sections at non-aligned boundaries and requires a lot of twiddling to reset. The 8K of memory at the top of the process space is now empty, and unmapped (and unmappable, it's higher than VM_MAXUSER_ADDRESS). Simplity the pmap code to manage process contexts, we no longer have to double map the UPAGES, this simplifies and should measuably speed up fork(). The following parts came from John Dyson: Set PG_G on the UPAGES that are now in kernel context, and invalidate them when swapping them out. Move the upages object (upobj) from the vmspace to the proc structure. Now that the UPAGES (pcb and kernel stack) are out of user space, make rfork(..RFMEM..) do what was intended by sharing the vmspace entirely via reference counting rather than simply inheriting the mappings.
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# ebd707d3	16-Oct-1996	Bruce Evans <bde@FreeBSD.org>	Fixed miscounting for non-statistical (GUPROF) profiling: - use CROSSJUMP() and CROSSJUMP_LABEL() for conditional jumps from idle() into cpu_switch() and vice versa. - moved badsw code to after cpu_switch(). Cosmetic changes: - moved sw0 string to be immediately after its caller (badsw). - removed unused #include.
# e8993539	19-Sep-1996	Poul-Henning Kamp <phk@FreeBSD.org>	Add APM_IDLE_CPU option, that is off by default. I maintain that it saves more power to simply "hlt" the CPU than to spend tons of time trying to tell the APM bios to do the same. In particular if you do it 100 times a second...
# 85acc688	30-Jul-1996	Bruce Evans <bde@FreeBSD.org>	Eliminated pcb_inl. It was always 0 because context switches don't occur in interrupt handlers.
# b1508c72	31-Jul-1996	David Greenman <dg@FreeBSD.org>	Converted timer/run queues to 4.4BSD queue style. Removed old and unused sleep(). Implemented wakeup_one() which may be used in the future to combat the "thundering herd" problem for some special cases. Reviewed by: dyson
# 79df6d85	25-Jun-1996	Bruce Evans <bde@FreeBSD.org>	trap.c: Fixed profiling of system times. It was pre-4.4Lite and didn't support statclocks. System times were too small by a factor of 8. Handle deferred profiling ticks the 4.4Lite way: use addupc_task() instead of addupc(). Call addupc_task() directly instead of using the ADDUPC() macro. Removed vestigial support for PROFTIMER. switch.s: Removed addupc(). resourcevar.h: Removed ADDUPC() and declarations of addupc(). cpu.h: Updated a comment. i386's never were tahoe's, and the deferred profiling tick became (possibly) multiple ticks in 4.4Lite. Obtained from: mostly from NetBSD
# 93f4b1bf	25-Jun-1996	Bruce Evans <bde@FreeBSD.org>	Save John Polstra's initial fix for profiling for reference. The multiplication in addupc() overflowed for addresses >= 256K, assuming the usual profil(2) scale parameter of 0x8000. addupc() will go away soon. Submitted by: John Polstra <jdp@polstra.com>
# eabe0f9f	30-Apr-1996	Bruce Evans <bde@FreeBSD.org>	Don't return unused values in cpu_switch() or savectx(). Don't preserve unused registers in the NPX case in savectx().
# 68832d30	25-Apr-1996	Poul-Henning Kamp <phk@FreeBSD.org>	Fix cpu_fork for real. Suggested by: bde
# 7d239214	18-Apr-1996	Poul-Henning Kamp <phk@FreeBSD.org>	Fix a bogon. cpu_fork & savectx ecpected cpu_switch to restore %eax, they shouldn't.
# 0513ce7f	13-Apr-1996	Bruce Evans <bde@FreeBSD.org>	Use PCB_SAVEFPU_SIZE instead of a too-small size in savectx(). This bug only affected FPU emulators. It might have caused bogus FPU states in core dumps and in the child pcb after a fork. Emulated FPU states in core dumps don't work for other reasons, and the child FPU state is reinitialized by exec, so the problem might not have caused any noticeable affects. Cleaned up #includes.
# 8371872e	18-Mar-1996	Nate Williams <nate@FreeBSD.org>	Always enable interrupts before calling the APM idle/busy routines. Suggested by: phk@FreeBSD.org
# 44f0e01b	11-Mar-1996	Nate Williams <nate@FreeBSD.org>	Removed undocumented an unused APM_SLOWSTART code.
# 267173e7	04-Feb-1996	David Greenman <dg@FreeBSD.org>	Rewrote cpu_fork so that it doesn't use pmap_activate, and removed pmap_activate since it's not used anymore. Changed cpu_fork so that it uses one line of inline assembly rather than calling mvesp() to get the current stack pointer. Removed mvesp() since it is no longer being used.
# ac474627	02-Feb-1996	David Greenman <dg@FreeBSD.org>	Killed last change - it was bogus. cpu_switch() already assumes that return address is on the stack.
# b09fb643	29-Jan-1996	David Greenman <dg@FreeBSD.org>	savectx() strikes again: the saved stack pointer wasn't properly adjusted to remove the return address. It's only the frame pointer and luck that allowed the code to work at all.
# 2924d491	22-Jan-1996	David Greenman <dg@FreeBSD.org>	Simplified savectx() a little and fixed a bug that caused it to return garbage in the child process rather than "1" like it is supposed to. Reviewed by: bde
# db6a20e2	03-Jan-1996	Garrett Wollman <wollman@FreeBSD.org>	Converted two options over to the new scheme: USER_LDT and KTRACE.
# 32831552	21-Dec-1995	David Greenman <dg@FreeBSD.org>	Rewrote most of the ddb stack traceback code. These changes are smarter about decoding trap/syscall/interrupt frames and generally works better than the previous stuff. Removed some special (incorrect) frobbing of the frame pointer that was messing some things up with the new traceback code.
# 87b91157	10-Dec-1995	Poul-Henning Kamp <phk@FreeBSD.org>	Staticize and cleanup. remove a TON of #includes from machdep.
# 7dfe504f	09-Dec-1995	Poul-Henning Kamp <phk@FreeBSD.org>	Remove various unused symbols and procedures.
# a29b63cb	03-Sep-1995	John Dyson <dyson@FreeBSD.org>	Machine dependent routines to support pre-zeroed free pages. This significantly improves demand zero performance.
# 648c711b	16-Feb-1995	Poul-Henning Kamp <phk@FreeBSD.org>	This is the latest version of the APM stuff from HOSOKAWA, I have looked briefly over it, and see some serious architectural issues in this stuff. On the other hand, I doubt that we will have any solution to these issues before 2.1, so we might as well leave this in. Most of the stuff is bracketed by #ifdef's so it shouldn't matter too much in the normal case. Reviewed by: phk Submitted by: HOSOKAWA, Tatsumi <hosokawa@mt.cs.keio.ac.jp>
# e6891db8	21-Jan-1995	Bruce Evans <bde@FreeBSD.org>	Don't count context switches here, they are already counted in mi_switch().
# b39b673d	03-Dec-1994	Bruce Evans <bde@FreeBSD.org>	i386/exception.s, Keep track of interrupt nesting level. It is normally 0 for syscalls and traps, but is fudged to 1 for their exit processing in case they metamorphose into an interrupt handler. i386/genassym.c; Remove support for the obsolete pcb_iml and pcb_cmap2. Add support for pcb_inl. i386/swtch.s: Fudge the interrupt nesting level across context switches and in the idle loop so that the work for preemptive context switches gets counted as interrupt time, the work for voluntary context switches gets counted mostly as system time (the part when curproc == 0 gets counted as interrupt time), and only truly idle time gets counted as idle time. Remove obsolete support (commented out and otherwise) for pcb_iml. Load curpcb just before curproc instead of just after so that curpcb is always valid if curproc is. A few more changes like this may fix tracing through context switches. Remove obsolete function swtch_to_inactive(). include/cpu.h: Use the new interrupt nesting level variable to implement a non-fake CLF_INTR() so that accounting for the interrupt state works. You can use top, iostat or (best) an up to date systat to see interrupt overheads. I see the expected huge interrupt overheads for ISA devices (on a 486DX/33, about 55% for an IDE drive transferring 1250K/sec and the same for a WD8013EBT network card transferring 1100K/sec). The huge interrupt overheads for serial devices are unfortunately normally invisible. include/pcb.h: Remove the obsolete pcb_iml and pcb_cmap2. Replace them by padding to preserve binary compatibility. Use part of the new padding for pcb_inl. isa/icu.s: isa/vector.s: Keep track of interrupt nesting level.
# 54d02404	30-Oct-1994	Bruce Evans <bde@FreeBSD.org>	locore.s: Build a dummy frame at the top of tmpstk to help debuggers trace the stack when the system is idle. swtch.s: idle(): Initialize the frame pointer so that debuggers don't try to trace a bogus stack. Load the frame pointer, load the stack pointer and switch out the old stack in the unique order that never leaves one of the pointers pointers invalid so that debuggers can trace idle(). Disabling interrupts provides sufficient validity for normal operation, but debuggers use (trace) traps.
# a0181c75	25-Oct-1994	David Greenman <dg@FreeBSD.org>	Moved initialization of tmpstk so that it immediately follows the kernel text. Fixed rounding bug that caused the last page of kernel text to be read/write instead of read-only. This is important now that tmpstk can crash into it. Removed +4 bias of tmpstk because it screws up ddb's ability to traceback correctly.
# 7216391e	01-Oct-1994	David Greenman <dg@FreeBSD.org>	"idle priority" support. Based on code from Henrik Vestergaard Draboel, but substantially rewritten by me.
# 22414e53	30-Sep-1994	David Greenman <dg@FreeBSD.org>	Laptop Advanced Power Management support by HOSOKAWA Tatsumi. Submitted by: HOSOKAWA Tatsumi
# 35aee080	01-Sep-1994	David Greenman <dg@FreeBSD.org>	Converted P_LINK -> P_FORW, P_RLINK -> P_BACK, minor optimization.
# e8fb0b2c	31-Aug-1994	David Greenman <dg@FreeBSD.org>	Realtime priority scheduling support. Submitted by: Henrik Vestergaard Draboel
# 8fdce837	30-Aug-1994	Bruce Evans <bde@FreeBSD.org>	Don't define LOCORE (as nothing) in sources. It is now defined consistently (as 1) in Makefile.i386 for all assembler sources.
# 27774cbb	19-Aug-1994	David Greenman <dg@FreeBSD.org>	Removed bogus save of CMAP2.
# ed3f8954	06-Aug-1994	David Greenman <dg@FreeBSD.org>	Made the tmpstk start at tmpstk. Not doing so causes problems for the debugger. Submitted by: John Dyson
# d23d07ef	02-Aug-1994	David Greenman <dg@FreeBSD.org>	Merged in post-1.1.5 work done by myself and John Dyson. This includes: me: 1) TLB flush optimization that effectively eliminates half of all of the TLB flushes. This works by only flushing the TLB when a page is "present" in memory (i.e. the valid bit is set in the page table entry). See section 5.3.5 of the Intel 386 Programmer's Reference Manual. 2) The handling of "CMAP" has been improved to catch attempts at multiple simultaneous use. John: 1) Added pmap_qenter/pmap_qremove functions for fast mapping of pages into the kernel. This is for future optimizations and support for the upcoming merged VM/buffer cache. Reviewed by: John Dyson
# 26f9a767	25-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
# 0e195446	20-Apr-1994	David Greenman <dg@FreeBSD.org>	Bug fixes and performance improvements from John Dyson and myself: 1) check va before clearing the page clean flag. Not doing so was causing the vnode pager error 5 messages when paging from NFS. (pmap.c) 2) put back interrupt protection in idle_loop. Bruce didn't think it was necessary, John insists that it is (and I agree). (swtch.s) 3) various improvements to the clustering code (vm_machdep.c). It's now enabled/used by default. 4) bad disk blocks are now handled properly when doing clustered IOs. (wd.c, vm_machdep.c) 5) bogus bad block handling fixed in wd.c. 6) algorithm improvements to the pageout/pagescan daemons. It's amazing how well 4MB machines work now.
# d2306226	02-Apr-1994	David Greenman <dg@FreeBSD.org>	New interrupt code from Bruce Evans. In additional to Bruce's attached list of changes, I've made the following additional changes: 1) i386/include/ipl.h renamed to spl.h as the name conflicts with the file of the same name in i386/isa/ipl.h. 2) changed all use of mask (i.e. netmask, biomask, ttymask, etc) to _imask (net_imask, etc). 3) changed vestige of splnet use in if_is to splimp. 4) got rid of "impmask" completely (Bruce had gotten rid of netmask), and are now using net_imask instead. 5) dozens of minor cruft to glue in Bruce's changes. These require changes I made to config(8) as well, and thus it must be rebuilt. -DG from Bruce Evans: sio: o No diff is supplied. Remove the define of setsofttty(). I hope that is enough. .s: o i386/isa/debug.h no longer exists. The event counters became too much trouble to maintain. All function call entry and exception entry counters can be recovered by using profiling kernel (the new profiling supports all entry points; however, it is too slow to leave enabled all the time; it also). Only BDBTRAP() from debug.h is now used. That is moved to exception.s. It might be worth preserving SHOW_BITS() and calling it from _mcount() (if enabled). o T_ASTFLT is now only set just before calling trap(). o All exception handlers set SWI_AST_MASK in cpl as soon as possible after entry and arrange for _doreti to restore it atomically with exiting. It is not possible to set it atomically with entering the kernel, so it must be checked against the user mode bits in the trap frame before committing to using it. There is no place to store the old value of cpl for syscalls or traps, so there are some complications restoring it. Profiling stuff (mostly in .s): o Changes to kern/subr_mcount.c, gcc and gprof are not supplied yet. o All interesting labels `foo' are renamed `_foo' and all uninteresting labels `_bar' are renamed `bar'. A small change to gprof allows ignoring labels not starting with underscores. o MCOUNT_LABEL() is to provide names for counters for times spent in exception handlers. o FAKE_MCOUNT() is a version of MCOUNT() suitable for exception handlers. Its arg is the pc where the exception occurred. The new mcount() pretends that this was a call from that pc to a suitable MCOUNT_LABEL(). o MEXITCOUNT is to turn off any timer started by MCOUNT(). /usr/src/sys/i386/i386/exception.s: o The non-BDB BPTTRAP() macros were doing a sti even when interrupts were disabled when the trap occurred. The sti (fixed) sti is actually a no-op unless you have my changes to machdep.c that make the debugger trap gates interrupt gates, but fixing that would make the ifdefs messier. ddb seems to be unharmed by both interrupts always disabled and always enabled (I had the branch in the fix back to front for some time :-(). o There is no known pushal bug. o tf_err can be left as garbage for syscalls. /usr/src/sys/i386/i386/locore.s: o Fix and update BDE_DEBUGGER support. o ENTRY(btext) before initialization was dangerous. o Warm boot shot was longer than intended. /usr/src/sys/i386/i386/machdep.c: o DON'T APPLY ALL OF THIS DIFF. It's what I'm using, but may require other changes. Use the following: o Remove aston() and setsoftclock(). Maybe use the following: o No netisr.h. o Spelling fix. o Delay to read the Rebooting message. o Fix for vm system unmapping a reduced area of memory after bounds_check_with_label() reduces the size of a physical i/o for a partition boundary. A similar fix is required in kern_physio.c. o Correct use of __CONCAT. It never worked here for non- ANSI cpp's. Is it time to drop support for non-ANSI? o gdt_segs init. 0xffffffffUL is bogus because ssd_limit is not 32 bits. The replacement may have the same value :-), but is more natural. o physmem was one page too low. Confusing variable names. Don't use the following: o Better numbers of buffers. Each 8K page requires up to 16 buffer headers. On my system, this results in 5576 buffers containing [up to] 2854912 bytes of memory. The usual allocation of about 384 buffers only holds 192K of disk if you use it on an fs with a block size of 512. o gdt changes for bdb. o TGT -> IDT changes for bdb. o #ifdefed changes for bdb. /usr/src/sys/i386/i386/microtime.s: o Use the correct asm macros. I think asm.h was copied from Mach just for microtime and isn't used now. It certainly doesn't belong in <sys>. Various macros are also duplicated in sys/i386/boot.h and libc/i386/.h. o Don't switch to and from the IRR; it is guaranteed to be selected (default after ICU init and explicitly selected in isa.c too, and never changed until the old microtime clobbered it). /usr/src/sys/i386/i386/support.s: o Non-essential changes (none related to spls or profiling). o Removed slow loads of %gs again. The LDT support may require not relying on %gs, but loading it is not the way to fix it! Some places (copyin ...) forgot to load it. Loading it clobbers the user %gs. trap() still loads it after certain types of faults so that fuword() etc can rely on it without loading it explicitly. Exception handlers don't restore it. If we want to preserve the user %gs, then the fastest method is to not touch it except for context switches. Comparing with VM_MAXUSER_ADDRESS and branching takes only 2 or 4 cycles on a 486, while loading %gs takes 9 cycles and using it takes another. o Fixed a signed branch to unsigned. /usr/src/sys/i386/i386/swtch.s: o Move spl0() outside of idle loop. o Remove cli/sti from idle loop. sw1 does a cli, and in the unlikely event of an interrupt occurring and whichqs becoming zero, sw1 will just jump back to _idle. o There's no spl0() function in asm any more, so use splz(). o swtch() doesn't need to be superaligned, at least with the new mcounting. o Fixed a signed branch to unsigned. o Removed astoff(). /usr/src/sys/i386/i386/trap.c: o The decentralized extern decls were inconsistent, of course. o Fixed typo MATH_EMULTATE in comments. / o Removed unused variables. o Old netmask is now impmask; print it instead. Perhaps we should print some of the new masks. o BTW, trap() should not print anything for normal debugger traps. /usr/src/sys/i386/include/asmacros.h: o DON'T APPLY ALL OF THIS DIFF. Just use some of the null macros as necessary. /usr/src/sys/i386/include/cpu.h: o CLKF_BASEPRI() changes since cpl == SWI_AST_MASK is now normal while the kernel is running. o Don't use var++ to set boolean variables. It fails after a mere 4G times :-) and is slower than storing a constant on [3-4]86s. /usr/src/sys/i386/include/cpufunc.h: o DON'T APPLY ALL OF THIS DIFF. You need mainly the include of <machine/ipl.h>. Unfortunately, <machine/ipl.h> is needed by almost everything for the inlines. /usr/src/sys/i386/include/ipl.h: o New file. Defines spl inlines and SWI macros and declares most variables related to hard and soft interrupt masks. /usr/src/sys/i386/isa/icu.h: o Moved definitions to <machine/ipl.h> /usr/src/sys/i386/isa/icu.s: o Software interrupts (SWIs) and delayed hardware interrupts (HWIs) are now handled uniformally, and dispatching them from splx() is more like dispatching them from _doreti. The dispatcher is essentially (handler[ffs(ipending & ~cpl)](). o More care (not quite enough) is taken to avoid unbounded nesting of interrupts. o The interface to softclock() is changed so that a trap frame is not required. o Fast interrupt handlers are now handled more uniformally. Configuration is still too early (new handlers would require bits in <machine/ipl.h> and functions to vector.s). o splnnn() and splx() are no longer here; they are inline functions (could be macros for other compilers). splz() is the nontrivial part of the old splx(). /usr/src/sys/i386/isa/ipl.h o New file. Supposed to have only bus-dependent stuff. Perhaps the h/w masks should be declared here. /usr/src/sys/i386/isa/isa.c: o DON'T APPLY ALL OF THIS DIFF. You need only things involving mask and MASK and comments about them. netmask is now a pure software mask. It works like the softclock mask. /usr/src/sys/i386/isa/vector.s: o Reorganize AUTO_EOI macros. o Option FAST_INTR_HANDLER_USERS_ES for people who don't trust fastintr handlers. o fastintr handlers need to metamorphose into ordinary interrupt handlers if their SWI bit has become set. Previously, sio had unintended latency for handling output completions and input of SLIP framing characters because this was not done. /usr/src/sys/net/netisr.h: o The machine-dependent stuff is now imported from <machine/ipl.h>. /usr/src/sys/sys/systm.h o DON'T APPLY ALL OF THIS DIFF. You need mainly the different splx() prototype. The spl*() prototypes are duplicated as inlines in <machine/ipl.h> but they need to be duplicated here in case there are no inlines. I sent systm.h and cpufunc.h to Garrett. We agree that spl0 should be replaced by splnone and not the other way around like I've done. /usr/src/sys/kern/kern_clock.c o splsoftclock() now lowers cpl so the direct call to softclock() works as intended. o softclock() interface changed to avoid passing the whole frame (some machines may need another change for profile_tick()). o profiling renamed _profiling to avoid ANSI namespace pollution. (I had to improve the mcount() interface and may as well fix it.) The GUPROF variant doesn't actually reference profiling here, but the 'U' in GUPROF should mean to select the microtimer mcount() and not change the interface.
# da59a31c	31-Jan-1994	David Greenman <dg@FreeBSD.org>	WINE/user LDT support from John Brezak, ported to FreeBSD by Jeffrey Hsu <hsu@soda.berkeley.edu>.
# d64f660f	17-Jan-1994	David Greenman <dg@FreeBSD.org>	Improvements mostly from John Dyson, with a little bit from me. * Removed pmap_is_wired * added extra cli/sti protection in idle (swtch.s) * slight code improvement in trap.c * added lots of comments * improved paging and other algorithms in VM system
# 7f8cb368	14-Jan-1994	David Greenman <dg@FreeBSD.org>	"New" VM system from John Dyson & myself. For a run-down of the major changes, see the log of any effected file in the sys/vm directory (swap_pager.c for instance).
# 0967373e	12-Nov-1993	David Greenman <dg@FreeBSD.org>	First steps in rewriting locore.s, and making info useful when the machine panics. i386/i386/locore.s: 1) got rid of most .set directives that were being used like #define's, and replaced them with appropriate #define's in the appropriate header files (accessed via genassym). 2) added comments to header inclusions and global definitions, and global variables 3) replaced some hardcoded constants with cpp defines (such as PDESIZE and others) 4) aligned all comments to the same column to make them easier to read 5) moved macro definitions for ENTRY, ALIGN, NOP, etc. to /sys/i386/include/asmacros.h 6) added #ifdef BDE_DEBUGGER around all of Bruce's debugger code 7) added new global '_KERNend' to store last location+1 of kernel 8) cleaned up zeroing of bss so that only bss is zeroed 9) fix zeroing of page tables so that it really does zero them all - not just if they follow the bss. 10) rewrote page table initialization code so that 1) works correctly and 2) write protects the kernel text by default 11) properly initialize the kernel page directory, upages, p0stack PT, and page tables. The previous scheme was more than a bit screwy. 12) change allocation of virtual area of IO hole so that it is fixed at KERNBASE + 0xa0000. The previous scheme put it right after the kernel page tables and then later expected it to be at KERNBASE +0xa0000 13) change multiple bogus settings of user read/write of various areas of kernel VM - including the IO hole; we should never be accessing the IO hole in user mode through the kernel page tables 14) split kernel support routines such as bcopy, bzero, copyin, copyout, etc. into a seperate file 'support.s' 15) split swtch and related routines into a seperate 'swtch.s' 16) split routines related to traps, syscalls, and interrupts into a seperate file 'exception.s' 17) remove some unused global variables from locore that got inserted by Garrett when he pulled them out of some .h files. i386/isa/icu.s: 1) clean up global variable declarations 2) move in declaration of astpending and netisr i386/i386/pmap.c: 1) fix calculation of virtual_avail. It previously was calculated to be right in the middle of the kernel page tables - not a good place to start allocating kernel VM. 2) properly allocate kernel page dir/tables etc out of kernel map - previously only took out 2 pages. i386/i386/machdep.c: 1) modify boot() to print a warning that the system will reboot in PANIC_REBOOT_WAIT_TIME amount of seconds, and let the user abort with a key on the console. The machine will wait for ever if a key is typed before the reboot. The default is 15 seconds, but can be set to 0 to mean don't wait at all, -1 to mean wait forever, or any positive value to wait for that many seconds. 2) print "Rebooting..." just before doing it. kern/subr_prf.c: 1) remove PANICWAIT as it is deprecated by the change to machdep.c i386/i386/trap.c: 1) add table of trap type strings and use it to print a real trap/ panic message rather than just a number. Lot's of work to be done here, but this is the first step. Symbolic traceback is in the TODO. i386/i386/Makefile.i386: 1) add support in to build support.s, exception.s and swtch.s ...and various changes to various header files to make all of the above happen.