Cross Reference: /freebsd-current/sys/amd64/amd64/exception.S

History log of /freebsd-current/sys/amd64/amd64/exception.S
Revision	Date	Author	Comments
# 8551c31b	11-Apr-2024	Elyes Haouas <ehaouas@noos.fr>	exception: Fix typos Signed-off-by: Elyes Haouas <ehaouas@noos.fr> Reviewed by: imp Pull Request: https://github.com/freebsd/freebsd-src/pull/885
# 95ee2897	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: two-line .h pattern Remove /^\s\\n \*\s+\$FreeBSD\$$\n/
# c6d31b83	18-Jul-2022	Konstantin Belousov <kib@FreeBSD.org>	AST: rework Make most AST handlers dynamically registered. This allows to have subsystem-specific handler source located in the subsystem files, instead of making subr_trap.c aware of it. For instance, signal delivery code on return to userspace is now moved to kern_sig.c. Also, it allows to have some handlers designated as the cleanup (kclear) type, which are called both at AST and on thread/process exit. For instance, ast(), exit1(), and NFS server no longer need to be aware about UFS softdep processing. The dynamic registration also allows third-party modules to register AST handlers if needed. There is one caveat with loadable modules: the code does not make any effort to ensure that the module is not unloaded before all threads processed through AST handler in it. In fact, this is already present behavior for hwpmc.ko and ufs.ko. I do not think it is worth the efforts and the runtime overhead to try to fix it. Reviewed by: markj Tested by: emaste (arm64), pho Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888
# 7aa47cac	25-Aug-2021	Konstantin Belousov <kib@FreeBSD.org>	amd64: remove lfence after swapgs on syscall entry According to the description of SBSS issue at https://software.intel.com/content/www/us/en/develop/articles/software-security-guidance/technical-documentation/speculative-behavior-swapgs-and-segment-registers.html lfence after swapgs is needed only for the case when swapgs could be speculatively executed. Since syscall entry, unlike exception and interrupt entries, executes swapgs unconditionally, there is no opportunity for speculation. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31682
# b0f71f1b	10-Aug-2021	Mark Johnston <markj@FreeBSD.org>	amd64: Add MD bits for KMSAN Interrupt and exception handlers must call kmsan_intr_enter() prior to calling any C code. This is because the KMSAN runtime maintains some TLS in order to track initialization state of function parameters and return values across function calls. Then, to ensure that this state is kept consistent in the face of asynchronous kernel-mode excpeptions, the runtime uses a stack of TLS blocks, and kmsan_intr_enter() and kmsan_intr_leave() push and pop that stack, respectively. Use these functions in amd64 interrupt and exception handlers. Note that handlers for user->kernel transitions need not be annotated. Also ensure that trap frames pushed by the CPU and by handlers are marked as initialized before they are used. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31467
# aa3ea612	31-Mar-2021	Konstantin Belousov <kib@FreeBSD.org>	x86: remove gcov kernel support Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D29529
# 098dbd7f	26-Dec-2020	Konstantin Belousov <kib@FreeBSD.org>	amd64 nmi handler: fix comment about %ebx Reported by: markj MFC after: 3 days
# d39f7430	25-Dec-2020	Konstantin Belousov <kib@FreeBSD.org>	amd64: preserve %cr2 in NMI/MCE/DBG handlers. These handlers could interrupt code which has interrupts disabled, and if a spurious page fault occurs during exception handler run, we get clobbered %cr2 in higher level stack. This is mostly a speculation, but it is based on hints from good sources. MFC after: 1 week Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27772
# 0675f4e1	19-Jul-2020	Konstantin Belousov <kib@FreeBSD.org>	Simplify non-pti syscall entry on amd64. Limit manipulations to use %rax as scratch to the pti portion of the syscall entry code. Submitted by: alc Reviewed by: markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D25722
# 3ec7e169	18-Jul-2020	Konstantin Belousov <kib@FreeBSD.org>	amd64 pmap: microoptimize local shootdowns for PCID PTI configurations When pmap operates in PTI mode, we must reload %cr3 on return to userspace. In non-PCID mode the reload always flushes all non-global TLB entries and we take advantage of it by only invalidating the KPT TLB entries (there is no cached UPT entries at all). In PCID mode, we flush both KPT and UPT TLB explicitly, but we can take advantage of the fact that PCID mode command to reload %cr3 includes a flag to flush/not flush target TLB. In particular, we can avoid the flush for UPT, instead record that load of pc_ucr3 into %cr3 on return to usermode should be flushing. This is done by providing either all-1s or ~CR3_PCID_MASK in pc_ucr3_load_mask. The mask is automatically reset to all-1s on return to usermode. Similarly, we can avoid flushing UPT TLB on context switch, replacing it by setting pc_ucr3_load_mask. This unifies INVPCID and non-INVPCID PTI ifunc, leaving only 4 cases instead of 6. This trick is also applicable both to the TLB shootdown IPI handlers, since handlers interrupt the target thread. But then we need to check pc_curpmap in handlers, and this would reopen the same race for INVPCID machines as was fixed in r306350 for non-INVPCID. To not introduce the same bug, unconditionally do spinlock_enter() in pmap_activate(). Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D25483
# da248a69	20-Nov-2019	Konstantin Belousov <kib@FreeBSD.org>	amd64: in double fault handler, do not rely on sane gsbase value. Typical reasons for doublefault faults are either kernel stack overflow or bugs in the code that manipulates protection CPU state. The later code is the code which often has to set up gsbase for kernel. Switching to explicit load of GSBASE MSR in the fault handler makes it more probable to output a useful information. Now all IST handlers have nmi_pcpu structure on top of their stacks. It would be even more useful to save gsbase value at the moment of the fault. I did not this because I do not want to modify PCB layout now. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# c4f056e8	13-Nov-2019	Konstantin Belousov <kib@FreeBSD.org>	amd64: only set PCB_FULL_IRET pcb flag when #gp or similar exception comes from usermode. If CPU supports RDFSBASE, the flag also means that userspace fsbase and gsbase are already written into pcb, which might be not true when we handle #gp from kernel. The offender is rdmsr_safe(), and the visible result is corrupted userspace TLS base. Reported by: pstef Sponsored by: The FreeBSD Foundation MFC after: 3 days
# 83ba1468	03-Nov-2019	Konstantin Belousov <kib@FreeBSD.org>	amd64: Store %cr3 into pcpu saved_ucr3 on double fault. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 90e35b0a	06-Aug-2019	Konstantin Belousov <kib@FreeBSD.org>	amd64: prevents speculations over swapgs reload of %gs base. Such speculations could use user-controlled %gs base, esp. since FreeBSD supports WRGSBASE instructions. Place LFENCEs on entry for each basic block after the test for previous kernel/user mode on the kernel entry, which prevents the speculation. Code accesses %gs-based PCPU before any serialization instructions are executed, like %cr3 reload for KPTI. With pti disabled, on haswell i7-4770S machine, "syscall_timings getppid" shows when no lfence is added to syscall path: test loop time iterations periteration getppid 0 1.040918865 4643611 0.000000224 getppid 1 1.004985962 4481816 0.000000224 getppid 2 1.005196483 4482363 0.000000224 with lfence: getppid 0 1.043701091 4554779 0.000000229 getppid 1 1.016930328 4438094 0.000000229 getppid 2 1.023223117 4466640 0.000000229 and ministat reports 'No difference proven at 95.0% confidence.' Security: CVE-2019-1125 Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 1947b298	03-Aug-2019	Konstantin Belousov <kib@FreeBSD.org>	amd64: Streamline exceptions and interrupts handlers. PTI-mode entry points were coded to set up the environment identical to non-PTI entry and then fall-through to non-PTI handlers, mostly. This has the drawback of requiring two more SWAPGS, first to access PCPU, and then to return to the state expected by the non-PTI entry point. Eliminate the duplication by doing more in entry stubs both for PTI and non-PTI, and adjusting the common code to expect that SWAPGS and some minimal registers saving is done by entries. Some less often used entries, in particular, #GP, #NP, and #SS, which can fault on doreti, are left as is because there are basically four variants of entrance, and they are not performance-critical, esp. comparing with e.g. #PF or interrupts. Reviewed by: markj (previous version) Tested by: pho (previous version) MFC after: 1 week Sponsored by: The FreeBSD Foundation
# 7355a02b	14-May-2019	Konstantin Belousov <kib@FreeBSD.org>	Mitigations for Microarchitectural Data Sampling. Microarchitectural buffers on some Intel processors utilizing speculative execution may allow a local process to obtain a memory disclosure. An attacker may be able to read secret data from the kernel or from a process when executing untrusted code (for example, in a web browser). Reference: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00233.html Security: CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091 Security: FreeBSD-SA-19:07.mds Reviewed by: jhb Tested by: emaste, lwhsu Approved by: so (gtetlow)
# 762138f7	05-Feb-2019	Konstantin Belousov <kib@FreeBSD.org>	amd64: clear callee-preserved registers on syscall exit. %r8, %r10, and on non-KPTI configuration %r9 were not restored on fast return from a syscall. Reviewed by: markj Approved by: so Security: CVE-2019-5595 Sponsored by: The FreeBSD Foundation MFC after: 0 minutes
# c1141fba	19-Aug-2018	Konstantin Belousov <kib@FreeBSD.org>	Update L1TF workaround to sustain L1D pollution from NMI. Current mitigation for L1TF in bhyve flushes L1D either by an explicit WRMSR command, or by software reading enough uninteresting data to fully populate all lines of L1D. If NMI occurs after either of methods is completed, but before VM entry, L1D becomes polluted with the cache lines touched by NMI handlers. There is no interesting data which NMI accesses, but something sensitive might be co-located on the same cache line, and then L1TF exposes that to a rogue guest. Use VM entry MSR load list to ensure atomicity of L1D cache and VM entry if updated microcode was loaded. If only software flush method is available, try to help the bhyve sw flusher by also flushing L1D on NMI exit to kernel mode. Suggested by and discussed with: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D16790
# b3a7db3b	29-Jul-2018	Konstantin Belousov <kib@FreeBSD.org>	Use SMAP on amd64. Ifuncs selectors dispatch copyin(9) family to the suitable variant, to set rflags.AC around userspace access. Rflags.AC bit is cleared in all kernel entry points unconditionally even on machines not supporting SMAP. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D13838
# dbe30617	01-Jun-2018	Bruce Evans <bde@FreeBSD.org>	Fix recent breakages of kernel profiling, mostly on i386 (high resolution kernel profiling remains broken). memmove() was broken using ALTENTRY(). ALTENTRY() is only different from ENTRY() in the profiling case, and its use in that case was sort of backwards. The backwardness magically turned memmove() into memcpy() instead of completely breaking it. Only the high resolution parts of profiling itself were broken. Use ordinary ENTRY() for memmove(). Turn bcopy() into a tail call to memmove() to reduce complications. This gives slightly different pessimizations and profiling lossage. The pessimizations are minimized by not using a frame pointer() for bcopy(). Calls to profiling functions from exception trampolines were not relocated. This caused crashes on the first exception. Fix this using function pointers. Addresses of exception handlers in trampolines were not relocated. This caused unknown offsets in the profiling data. Relocate by abusing setidt_disp as for pmc although this is slower than necessary and requires namespace pollution. pmc seems to be missing some relocations. Stack traces and lots of other things in debuggers need similar relocations. Most user addresses were misclassified as unknown kernel addresses and then ignored. Treat all unknown addresses as user. Now only user addresses in the kernel text range are significantly misclassified (as known kernel addresses). The ibrs functions didn't preserve enough registers. This is the only recent breakage on amd64. Although these functions are written in asm, in the profiling case they call profiling functions which are mostly for the C ABI, so they only have to save call-used registers. They also have to save arg and return registers in some cases and actually save them in all cases to reduce complications. They end up saving all registers except %ecx on i386 and %r10 and %r11 on amd64. Saving these is only needed for 1 caller on each of amd64 and i386. Save them there. This is slightly simpler. Remove saving %ecx in handle_ibrs_exit on i386. Both handle_ibrs_entry and handle_ibrs_exit use %ecx, but only the latter needed to or did save it. But saving it there doesn't work for the profiling case. amd64 has more automatic saving of the most common scratch registers %rax, %rcx and %rdx (its complications for %r10 are from unusual use of %r10 by SYSCALL). Thus profiling of handle_ibrs_exit_rs() was not broken, and I didn't simplify the saving by moving the saving of these registers from it to the caller.
# 053641bb	08-May-2018	Konstantin Belousov <kib@FreeBSD.org>	Prepare DB# handler for deferred trigger of watchpoints. Since pop %ss/mov %ss instructions defer all interrupts and exceptions for the next instruction, it is possible that the userspace watchpoint trap executes on the first instruction of the kernel entry for syscall/bpt. In this case, DB# should be treated similarly to NMI: on amd64 we must always load GSBASE even if the trap comes from kernel mode, and load the kernel page table root into %cr3. Moreover, the trap must use the dedicated stack, because we are still on the user stack when trapped on syscall entry. For i386, we must reload %cr3. The syscall instruction is not configured, so there is no issue with executing on user stack when trapping. Due to some CPU erratas it is not always possible to detect that the userspace watchpoint triggered by inspecting %dr6. In trap(), compare the trap %rip with the known unsafe entry points and if matched pretend that the watchpoint did not fire at all. Thank you to the MSRC Incident Response Team, and in particular Greg Lenti and Nate Warfield, for coordinating the response to this issue across multiple vendors. Thanks to Computer Recycling at The Working Center of Kitchener for making hardware available to allow us to test the patch on additional CPU families. Reviewed by: jhb Discussed with: Matthew Dillon Tested by: emaste Sponsored by: The FreeBSD Foundation Security: CVE-2018-8897 Security: FreeBSD-SA-18:06.debugreg
# 27275f8a	26-Apr-2018	Tycho Nightingale <tychon@FreeBSD.org>	Expand the checks for UCR3 == PMAP_NO_CR3 to enable processes to be excluded from PTI. Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15100
# 19c5cea3	25-Apr-2018	Tycho Nightingale <tychon@FreeBSD.org>	If a trap is encountered upon executing iretq from within doreti() the hardware will ensure the stack pointer is aligned to a 16-byte boundary before saving the fault state on the stack. In the PTI case, handle this potential alignment adjustment by copying both frames independently while unwinding the stack in between. Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15183
# 6469bdcd	06-Apr-2018	Brooks Davis <brooks@FreeBSD.org>	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941
# fc2a8776	20-Mar-2018	Ed Maste <emaste@FreeBSD.org>	Rename assym.s to assym.inc assym is only to be included by other .s files, and should never actually be assembled by itself. Reviewed by: imp, bdrewery (earlier) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D14180
# 319117fd	31-Jan-2018	Konstantin Belousov <kib@FreeBSD.org>	IBRS support, AKA Spectre hardware mitigation. It is coded according to the Intel document 336996-001, reading of the patches posted on lkml, and some additional consultations with Intel. For existing processors, you need a microcode update which adds IBRS CPU features, and to manually enable it by setting the tunable/sysctl hw.ibrs_disable to 0. Current status can be checked in sysctl hw.ibrs_active. The mitigation might be inactive if the CPU feature is not patched in, or if CPU reports that IBRS use is not required, by IA32_ARCH_CAP_IBRS_ALL bit. Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D14029
# b4dfc9d7	19-Jan-2018	Konstantin Belousov <kib@FreeBSD.org>	PTI: Trap if we returned to userspace with kernel (full) page table still active. Map userspace portion of VA in the PTI kernel-mode page table as non-executable. This way, if we ever miss reloading ucr3 into %cr3 on the return to usermode, the process traps instead of executing in potentially vulnerable setup. Catch the condition of such trap and verify user-mode %cr3, which is saved by page fault handler. I peek this trick in some article about Linux implementation. Reviewed by: alc, markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 12 days DIfferential revision: https://reviews.freebsd.org/D13956
# 68fd3b0e	18-Jan-2018	John Baldwin <jhb@FreeBSD.org>	Use a dedicated per-CPU stack for machine check exceptions. Similar to NMIs, machine check exceptions can fire at any time and are not masked by IF. This means that machine checks can fire when the kstack is too deep to hold a trap frame, or at critical sections in trap handlers when a user %gs is used with a kernel %cs. Use the same strategy used for NMIs of using a dedicated per-CPU stack configured in IST 3. Store the CPU's pcpu pointer at the stop of the stack so that the machine check handler can reliably find the proper value for %gs (also borrowed from NMIs). This should also fix a similar issue with PTI with a MC# occurring while the CPU is executing on the trampoline stack. While here, bypass trap() entirely and just call mca_intr(). This avoids a bogus call to kdb_reenter() (there's no reason to try to reenter kdb if a MC# is raised). Reviewed by: kib Tested by: avg (on AMD without PTI) Differential Revision: https://reviews.freebsd.org/D13962
# f36b1fe0	18-Jan-2018	John Baldwin <jhb@FreeBSD.org>	Remove two no-longer-used labels from the NMI interrupt handler. Reviewed by: kib
# 7f513d17	18-Jan-2018	John Baldwin <jhb@FreeBSD.org>	Adjust branch target in NMI handler for the !PTI case. In the !PTI case the NMI handler jumped past the instructions that set %rdi to point to the current PCB, but the target instructions assumed %rdi were set. Reviewed by: kib Tested by: pho
# 9cb26f73	17-Jan-2018	Mark Johnston <markj@FreeBSD.org>	Annotate a couple of changes from r328083. Reviewed by: kib X-MFC with: r328083
# bd50262f	17-Jan-2018	Konstantin Belousov <kib@FreeBSD.org>	PTI for amd64. The implementation of the Kernel Page Table Isolation (KPTI) for amd64, first version. It provides a workaround for the 'meltdown' vulnerability. PTI is turned off by default for now, enable with the loader tunable vm.pmap.pti=1. The pmap page table is split into kernel-mode table and user-mode table. Kernel-mode table is identical to the non-PTI table, while usermode table is obtained from kernel table by leaving userspace mappings intact, but only leaving the following parts of the kernel mapped: kernel text (but not modules text) PCPU GDT/IDT/user LDT/task structures IST stacks for NMI and doublefault handlers. Kernel switches to user page table before returning to usermode, and restores full kernel page table on the entry. Initial kernel-mode stack for PTI trampoline is allocated in PCPU, it is only 16 qwords. Kernel entry trampoline switches page tables. then the hardware trap frame is copied to the normal kstack, and execution continues. IST stacks are kept mapped and no trampoline is needed for NMI/doublefault, but of course page table switch is performed. On return to usermode, the trampoline is used again, iret frame is copied to the trampoline stack, page tables are switched and iretq is executed. The case of iretq faulting due to the invalid usermode context is tricky, since the frame for fault is appended to the trampoline frame. Besides copying the fault frame and original (corrupted) frame to kstack, the fault frame must be patched to make it look as if the fault occured on the kstack, see the comment in doret_iret detection code in trap(). Currently kernel pages which are mapped during trampoline operation are identical for all pmaps. They are registered using pmap_pti_add_kva(). Besides initial registrations done during boot, LDT and non-common TSS segments are registered if user requested their use. In principle, they can be installed into kernel page table per pmap with some work. Similarly, PCPU can be hidden from userspace mapping using trampoline PCPU page, but again I do not see much benefits besides complexity. PDPE pages for the kernel half of the user page tables are pre-allocated during boot because we need to know pml4 entries which are copied to the top-level paging structure page, in advance on a new pmap creation. I enforce this to avoid iterating over the all existing pmaps if a new PDPE page is needed for PTI kernel mappings. The iteration is a known problematic operation on i386. The need to flush hidden kernel translations on the switch to user mode make global tables (PG_G) meaningless and even harming, so PG_G use is disabled for PTI case. Our existing use of PCID is incompatible with PTI and is automatically disabled if PTI is enabled. PCID can be forced on only for developer's benefit. MCE is known to be broken, it requires IST stack to operate completely correctly even for non-PTI case, and absolutely needs dedicated IST stack because MCE delivery while trampoline did not switched from PTI stack is fatal. The fix is pending. Reviewed by: markj (partially) Tested by: pho (previous version) Discussed with: jeff, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 4975c202	10-Jan-2018	Konstantin Belousov <kib@FreeBSD.org>	Do not clear %RFLAGS.DF on fast syscall entry. Hardware already did it for us due to the mask loaded into the MSR_SF_MASK msr register. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13838
# 0d4e7ec5	26-Aug-2017	Ryan Libby <rlibby@FreeBSD.org>	amd64: drop q suffix from rd[fg]sbase for gas compatibility Reviewed by: kib Approved by: markj (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12133
# 3e902b3d	21-Aug-2017	Konstantin Belousov <kib@FreeBSD.org>	Make WRFSBASE and WRGSBASE instructions functional. Right now, we enable the CR4.FSGSBASE bit on CPUs which support the facility (Ivy and later), to allow usermode to read fs and gs bases without syscalls. This bit also controls the write access to bases from userspace, but WRFSBASE and WRGSBASE instructions currently cannot be used, because return path from both exceptions or interrupts overrides bases with the values from pcb. Supporting the instructions is useful because this means that usermode can implement green-threads completely in userspace without issuing syscalls to change all of the machine context. Support is implemented by saving the fs base and user gs base when PCB_FULL_IRET flag is set. The flag is set on the context switch, which potentially causes clobber of the bases due to activation of another context, and when explicit modification of the user context by a syscall or exception handler is performed. In particular, the patch moves setting of the flag before syscalls change context. The changes to doreti_exit and PUSH_FRAME to clear PCB_FULL_IRET on entry from userspace can be considered a bug fixes on its own. Reviewed by: jhb (previous version) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D12023
# fbbd9655	28-Feb-2017	Warner Losh <imp@FreeBSD.org>	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
# edafb5a3	03-May-2016	Pedro F. Giffuni <pfg@FreeBSD.org>	sys/amd64: Small spelling fixes. No functional change.
# 44784411	13-Apr-2016	John Baldwin <jhb@FreeBSD.org>	Expose doreti as a global symbol on amd64 and i386. doreti provides the common code path for returning from interrupt andlers on x86. Exposing doreti as a global symbol allows kernel modules to include low-level interrupt handlers instead of requiring all low-level handlers to be statically compiled into the kernel. Submitted by: Howard Su <howard0su@gmail.com> Reviewed by: kib
# 9054bcbc	12-Apr-2016	Andriy Gapon <avg@FreeBSD.org>	[amd64] dtrace_invop handler is to be called only for kernel exceptions DTrace-related exceptions in userland code are handled elsewhere. One practical problem was a crash in dtrace_invop_start() when saved %rsp pointed to a virtual address that was not backed. i386 code already ignored userland exceptions. Reviewed by: markj, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D5906
# 21be12e0	27-Aug-2015	Mark Johnston <markj@FreeBSD.org>	Remove an unneeded instruction. MFC after: 1 week
# 2d45c2d5	16-Dec-2014	Konstantin Belousov <kib@FreeBSD.org>	The iret instruction may generate #np and #ss fault, besides #gp. When returning to usermode, the handler for that exceptions is also executed with wrong gs base. Handle all three possible faults in the same way, checking for iret fault, and performing full iret. Sponsored by: The FreeBSD Foundation MFC after: 3 days
# 5a5f9d21	18-Jul-2014	Mark Johnston <markj@FreeBSD.org>	Use a C wrapper for trap() instead of checking and calling the DTrace trap hook in assembly. Suggested by: kib Reviewed by: kib (original version) X-MFC-With: r268600
# 291624fd	13-Jul-2014	Mark Johnston <markj@FreeBSD.org>	Invoke the DTrace trap handler before calling trap() on amd64. This matches the upstream implementation and helps ensure that a trap induced by tracing fbt::trap:entry is handled without recursively generating another trap. This makes it possible to run most (but not all) of the DTrace tests under common/safety/ without triggering a kernel panic. Submitted by: Anton Rang <anton.rang@isilon.com> (original version) Phabric: D95
# 64e97265	29-May-2014	Konstantin Belousov <kib@FreeBSD.org>	When usermode loaded non-default segment selector into the %gs, correctly prepare KGSBASE msr to restore the user descriptor base on the last swapgs during return to usermode. Reported and tested by: peterj Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 54366c0b	25-Nov-2013	Attilio Rao <attilio@FreeBSD.org>	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip
# c788f925	18-Jun-2013	Konstantin Belousov <kib@FreeBSD.org>	Some clarifications and updates for the comments, mostly retrieved from Bruce Evans. Trim the trailing spaces. MFC after: 1 week
# 6806ce6e	27-May-2013	Konstantin Belousov <kib@FreeBSD.org>	When handling an exception from the attempt from loading the faulting context on return from the trap handler, re-enable the interrupts on i386 and amd64. The trap return path have to disable interrupts since the sequence of loading the machine state is not atomic. The trap() function which transfers the control to the special handler would enable the interrupt, but an iret loads the previous eflags with PSL_I clear. Then, the special handler calls trap() on its own, which now sees the original eflags with PSL_I set and does not enable interrupts. The end result is that signal delivery and process exiting code could be executed with interrupts disabled, which is generally wrong and triggers several assertions. For amd64, the interrupts are enabled conditionally based on PSL_I in the eflags of the outer frame, as it is already done for doreti_iret_fault. For i386, the interrupts are enabled unconditionally, the ast loop could have opened a window with interrupts enabled just before the iret anyway. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 7a1c55c3	15-Sep-2011	Konstantin Belousov <kib@FreeBSD.org>	Microoptimize the return path for the fast syscalls on amd64. Arrange the code to have the fall-through path to follow the likely target. Do not use intermediate register to reload user %rsp. Proposed by: alc Reviewed by: alc, jhb Approved by: re (bz) MFC after: 2 weeks
# e4505da6	11-Sep-2011	Konstantin Belousov <kib@FreeBSD.org>	The jump target shall be after the padding, not into it. Reported by: alc Approved by: re (bz) MFC after: 2 weeks
# 8bd1142b	11-Sep-2011	Konstantin Belousov <kib@FreeBSD.org>	Perform amd64-specific microoptimizations for native syscall entry sequence. The effect is ~1% on the microbenchmark. In particular, do not restore registers which are preserved by the C calling sequence. Align the jump target. Avoid unneeded memory accesses by calculating some data in syscall entry trampoline. Reviewed by: jhb Approved by: re (bz) MFC after: 2 weeks
# 5ab73cbc	08-Apr-2011	Konstantin Belousov <kib@FreeBSD.org>	Disable local interrupts before testing the PCB_FULL_IRET flag. Thread might be preempted after testing, which causes the flag to be cleared. If ast was not delivered, we will do sysret with potentially wrong fs/gs bases. Reviewed by: jhb, jkim MFC after: 1 week (together with r220430, r220452)
# 13fb631a	08-Apr-2011	John Baldwin <jhb@FreeBSD.org>	Fix a bug in the previous change to restore the fast path for syscall return. The ast() function may cause a context switch in which case PCB_FULL_IRET would be set in the pcb. However, the code was not rechecking the flag after ast() returned and would not properly restore the FSBASE and GSBASE MSRs. To fix, recheck the PCB_FULL_IRET flag after ast() returns. While here, trim an instruction (and memory access) from the doreti path and fix a typo in a comment. MFC after: 1 week
# 615d2dff	07-Apr-2011	John Baldwin <jhb@FreeBSD.org>	pcb_flags is an int, so use testl rather than testq. Pointy hat to: jhb Submitted by: jkim MFC after: 1 week
# 1438b4ce	07-Apr-2011	John Baldwin <jhb@FreeBSD.org>	If a system call does not request a full interrupt return, use a fast path via the sysretq instruction to return from the system call. This was removed in 190620 and not quite fully restored in 195486. This resolves most of the performance regression in system call microbenchmarks between 7 and 8 on amd64. Reviewed by: kib MFC after: 1 week
# 50e3cec3	22-Dec-2010	Jung-uk Kim <jkim@FreeBSD.org>	Increase size of pcb_flags to four bytes. Requested by: bde, jhb
# e6c006d9	21-Dec-2010	Jung-uk Kim <jkim@FreeBSD.org>	Improve PCB flags handling and make it more robust. Add two new functions for manipulating pcb_flags. These inline functions are very similar to atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK prefix for SMP. Add comments about the rationale[1]. Use these functions wherever possible. Although there are some places where it is not strictly necessary (e.g., a PCB is copied to create a new PCB), it is done across the board for sake of consistency. Turn pcb_full_iret into a PCB flag as it is safe now. Move rarely used fields before pcb_flags and reduce size of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in the neighborhood. Reviewed by: kib Submitted by: kib[1] MFC after: 2 months
# 0f0170e6	06-Dec-2010	Konstantin Belousov <kib@FreeBSD.org>	Retire write-only PCB_FULLCTX pcb flag on amd64. Reminded by: Petr Salinger <Petr.Salinger seznam cz> Tested by: pho MFC after: 1 week
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# cba32694	28-Aug-2010	Rui Paulo <rpaulo@FreeBSD.org>	Register an interrupt vector for DTrace return probes. There is some code missing in lapic to make sure that we don't overwrite this entry, but this will be done on a sequent commit. Sponsored by: The FreeBSD Foundation
# 595473a5	23-Jun-2010	Konstantin Belousov <kib@FreeBSD.org>	Clear DF bit in eflags/rflags on the kernel entry. The i386 and amd64 ABI specifies the DF should be zero, and newer compilers do not clear DF before using DF-sensitive instructions. The DF clearing for signal handlers was done some time ago. MFC after: 1 week
# 3c5636cd	19-May-2010	Konstantin Belousov <kib@FreeBSD.org>	MFC r207958: Route all returns from the interrupts and faults through the doreti_iret labeled iretq instruction. MFC r208026: Do not use .extern.
# 6a440fc3	12-May-2010	Konstantin Belousov <kib@FreeBSD.org>	Route all returns from the interrupts and faults through the doreti_iret labeled iretq instruction. Suppose that multithreaded process executes two threads, currently scheduled on different processors. Let assume that thread A executes using %cs or %ss pointing into the descriptor from LDT. If IPI comes which handler does not return by jump to doreti, and meantime thread B invalidates descriptor pointed to by %cs or %ss, then iretq from IPI handler could fault. Routing the return by doreti_iret allows kernel to catch the situation and recover from it by sending signal to the usermode. Tested by: pho MFC after: 1 week
# d4b0face	05-May-2010	Konstantin Belousov <kib@FreeBSD.org>	MFC r207570: Style and comment adjustements.
# bf6f1b56	03-May-2010	Konstantin Belousov <kib@FreeBSD.org>	Style and comment adjustements. Suggested and reviewed by: bde MFC after: 3 days
# b8fde9ef	17-Apr-2010	Konstantin Belousov <kib@FreeBSD.org>	MFC r206623: ld_gs_base is executing with stack containing only the frame, temporary pushed %rflags has been popped already.
# b71e04d3	14-Apr-2010	Konstantin Belousov <kib@FreeBSD.org>	ld_gs_base is executing with stack containing only the frame, temporary pushed %rflags has been popped already. Pointy hat to: kib MFC after: 3 days
# a0e70f39	13-Apr-2010	Konstantin Belousov <kib@FreeBSD.org>	MFC r206459: Handle a case when non-canonical address is loaded into the fsbase or gsbase MSR.
# a35d07a8	10-Apr-2010	Konstantin Belousov <kib@FreeBSD.org>	Handle a case when non-canonical address is loaded into the fsbase or gsbase MSR. MFC after: 3 days
# 4ccf64eb	06-Apr-2010	Nathan Whitehorn <nwhitehorn@FreeBSD.org>	MFC r205014,205015: Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. This MFC is required for MFCs of later changes to the freebsd32 compatibility from HEAD. Requested by: kib
# 841c0c7e	11-Mar-2010	Nathan Whitehorn <nwhitehorn@FreeBSD.org>	Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb
# 32580301	25-Feb-2010	Attilio Rao <attilio@FreeBSD.org>	Introduce the new kernel sub-tree x86 which should contain all the code shared and generalized between our current amd64, i386 and pc98. This is just an initial step that should lead to a more complete effort. For the moment, a very simple porting of cpufreq modules, BIOS calls and the whole MD specific ISA bus part is added to the sub-tree but ideally a lot of code might be added and more shared support should grow. Sponsored by: Sandvine Incorporated Reviewed by: emaste, kib, jhb, imp Discussed on: arch MFC: 3 weeks
# d77e2734	10-Jul-2009	Konstantin Belousov <kib@FreeBSD.org>	When amd64 CPU cannot load segment descriptor during trap return to usermode, it generates GPF, that is mirrored to user mode as SIGSEGV. The offending register in mcontext should contain the value loading of which generated the GPF, and it is so on i386. On amd64, we currently report segment descriptor in tf_err, while segment register contains the corrected value loaded by trap handler. Fix the issue by behaving like i386, reloading segment register in trap frame after signal frame is pushed onto user stack. Noted and tested by: pho Approved by: re (kensmith)
# a2622e5d	09-Jul-2009	Konstantin Belousov <kib@FreeBSD.org>	Restore the segment registers and segment base MSRs for amd64 syscall return path only when neither thread was context switched while executing syscall code nor syscall explicitely modified LDT or MSRs. Save segment registers in trap handlers before interrupts are enabled, to not allow context switches to happen before registers are saved. Use separated byte in pcb for indication of fast/full return, since pcb_flags are not synchronized with context switches. The change puts back syscall microbenchmark numbers that were slowed down after commit of the support for LDT on amd64. Reviewed by: jeff Tested (and tested, and tested ...) by: pho Approved by: re (kensmith)
# 2c66ccca	01-Apr-2009	Konstantin Belousov <kib@FreeBSD.org>	Save and restore segment registers on amd64 when entering and leaving the kernel on amd64. Fill and read segment registers for mcontext and signals. Handle traps caused by restoration of the invalidated selectors. Implement user-mode creation and manipulation of the process-specific LDT descriptors for amd64, see sysarch(2). Implement support for TSS i/o port access permission bitmap for amd64. Context-switch LDT and TSS. Do not save and restore segment registers on the context switch, that is handled by kernel enter/leave trampolines now. Remove segment restore code from the signal trampolines for freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason. Implement amd64-specific compat shims for sysarch. Linuxolator (temporary ?) switched to use gsbase for thread_area pointer. TODO: Currently, gdb is not adapted to show segment registers from struct reg. Also, no machine-depended ptrace command is added to set segment registers for debugged process. In collaboration with: pho Discussed with: peter Reviewed by: jhb Linuxolator tested by: dchagin
# bb471e33	03-Feb-2009	Joseph Koshy <jkoshy@FreeBSD.org>	Improve robustness of NMI handling, for NMIs recognized in kernel mode. - Make the NMI handler run on its own stack (TSS_IST2). - Store the GSBASE value for each CPU just before the start of each NMI stack, permitting efficient retrieval using %rsp-relative addressing. - For NMIs taken from kernel mode, program MSR_GSBASE explicitly since one or both of MSR_GSBASE and MSR_KGSBASE can be potentially invalid. The current contents of MSR_GSBASE are saved and restored at exit. - For NMIs handled from user mode, continue to use 'swapgs' to load the per-CPU GSBASE. Reviewed by: jeff Debugging help: jeff Tested by: gnn, Artem Belevich <artemb at gmail dot com>
# a353a345	14-Jan-2009	Konstantin Belousov <kib@FreeBSD.org>	Disable interrupts, if they were enabled, before doing swapgs. Otherwise, interrupt may happen while we run with kernel CS and usermode gsbase. Reviewed by: jeff MFC after: 1 week
# 4e706fe3	14-Dec-2008	Joseph Koshy <jkoshy@FreeBSD.org>	Bug fix: %ebx needs to be preserved in the user callchain capture path.
# 6fe00c78	13-Dec-2008	Joseph Koshy <jkoshy@FreeBSD.org>	- Bug fix: prevent a thread from migrating between CPUs between the time it is marked for user space callchain capture in the NMI handler and the time the callchain capture callback runs. - Improve code and control flow clarity by invoking hwpmc(4)'s user space callchain capture callback directly from low-level code. Reviewed by: jhb (kern/subr_trap.c) Testing (various patch revisions): gnn, Fabien Thomas <fabien dot thomas at netasq dot com>, Artem Belevich <artemb at gmail dot com>
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 8ad85ff2	18-Aug-2008	Konstantin Belousov <kib@FreeBSD.org>	The doreti_iret_fault code is always called with gs base MSR containing kernel gs base, because %rip is adjusted only on kernel-mode trap caused by iretq execution. On the other hand, the stack contains (hardware part of) trap frame from the usermode. As a consequence, checking for frame mode and doing swapgs causes the kernel to enter trap() with usermode gs base. Remove the check for mode and conditional swapgs, we already have right gs base in the MSR. Submitted by: Nate Eldredge <neldredge math ucsd edu> MFC after: 3 days
# 367f3ce5	24-May-2008	John Birrell <jb@FreeBSD.org>	Add the DTrace hooks for exception handling (Function boundary trace -fbt- provider), cyclic clock and syscalls.
# d07f36b0	07-Dec-2007	Joseph Koshy <jkoshy@FreeBSD.org>	Kernel and hwpmc(4) support for callchain capture. Sponsored by: FreeBSD Foundation and Google Inc.
# 185250da	15-Nov-2007	John Baldwin <jhb@FreeBSD.org>	Add support for cross double fault frames in stack traces: - Populate the register values for the trapframe put on the stack by the double fault handler. - Teach DDB's trace routine to treat a double fault like other trap frames. MFC after: 3 days
# 4b0f4e9d	22-Dec-2006	David Xu <davidxu@FreeBSD.org>	Fix a panic when rebooting a SMP machine, when option STOP_NMI is used, nmi handler is used to stop other processors, nmi hander calls trap(), however, trap() now accepts a pointer rather than a reference, this was changed by kmacy@.
# e5f8d409	16-Dec-2006	Kip Macy <kmacy@FreeBSD.org>	Newer versions of gcc don't support treating structures passed by value as if they were really passed by reference. Specifically, the dead stores elimination pass in the GCC 4.1 optimiser breaks the non-compliant behavior on which FreeBSD relied. This change brings FreeBSD up to date by switching trap frames to being explicitly passed by reference. Reviewed by: kan Tested by: kan
# 7ef5ed2b	27-Aug-2005	Joseph Koshy <jkoshy@FreeBSD.org>	- Special-case NMI handling on the AMD64. On entry or exit from the kernel the 'alltraps' and 'doreti' code used taken by normal traps disables interrupts to protect the critical sections where it is setting up %gs. This protection is insufficient in the presence of NMIs since NMIs can be taken even when the processor has disabled normal interrupts. Thus the NMI handler needs to actually read MSR_GBASE on entry to the kernel to determine whether a swap of %gs using 'swapgs' is needed. However, reads of MSRs are expensive and integrating this check into the 'alltraps'/'doreti' path would penalize normal interrupts. - Teach DDB about the 'nmi_calltrap' symbol. Reviewed by: bde, peter (older versions of this change)
# 0e7bd54c	25-Aug-2005	Stephan Uphoff <ups@FreeBSD.org>	NMI handler should not enable interrupts. Tested by: kris@ MFC after: 3 weeks
# 2bdfd0d8	29-Jun-2005	Peter Wemm <peter@FreeBSD.org>	Add a special-case handler for general protection faults. It appears to be possible to get the swapgs state reversed if doreti traps during the iretq. Attempt to handle this. load_gs() might need special handling too. Running the kernel with the user's TLS and the kernel's PCPU space interchanged would be bad(TM). Discovered as a result of a conversation with: bde Approved by: re
# e973a7af	23-Jun-2005	Peter Wemm <peter@FreeBSD.org>	Eliminate a source of 'trap xx with interrupts disabled'. I was jumping to the wrong backend code and neglecting to re-enable interrupts after the stack prep. Approved by: re
# ea5d7683	22-May-2005	Peter Wemm <peter@FreeBSD.org>	Fix some of the problems Bruce observed with this code.
# c592ba5f	20-May-2005	Peter Wemm <peter@FreeBSD.org>	For non-profiling kernels, there were two symbols assigned to the same address. One was alltraps_with_regs_pushed, the other was calltrap. When the stack tracer walks up, it looks for magic symbol names to determine how to parse non-standard stack frames, such as a trapframe. It was looking for "calltrap". Which of the two symbols you got depended on things like Phase of moon, etc. If you were unlucky, you got a garbage stack trace for things like 'debug.trace_on_panic', which would completely hide the actual source of the problem.
# ba2426ff	20-Jan-2005	Peter Wemm <peter@FreeBSD.org>	MFi386: whitespace, copyright header, etc updates
# 7a071c6a	15-Aug-2004	David E. O'Brien <obrien@FreeBSD.org>	Complete 'IA32' -> 'COMPAT_IA32' change for the Linuxulator32.
# d7d197e0	23-May-2004	Bruce Evans <bde@FreeBSD.org>	Oops, ".align 4" for the data section in the previous commit should have been ".p2align 4". This bug is cosmetic since the data section happens to be empty.
# 003d5d66	23-May-2004	Bruce Evans <bde@FreeBSD.org>	Fixed profiling of trap, syscall and interrupt handlers and some ordinary functions, essentially by backing out half of rev.1.115 of amd64/exception.S. The handlers must be between certain labels for the purposes of profiling, and this was broken by scattering them in separately compiled .S files, especially for ordinary functions that ended up between the labels. Merge the files by #including them as before, except with different pathnames and better comments and organization. Changes to the scattered files are minimal -- just move the labels to the file that does the #includes. This also partly fixes profiling of IPIs -- all IPI handlers are now correctly classified as interrupt handlers, but many are still missing mcount calls.
# 03d5ca33	23-May-2004	Bruce Evans <bde@FreeBSD.org>	Restored FAKE_MCOUNT() and MEXITCOUNT invocations and adjusted them for amd64 as necessary. This is routine, except: - the FAKE_MCOUNT($bintr) in doreti was missing the '$'. This gave a a garbage address made up of padding bytes (with the nop byte 0x90 as the MSB) instead of the intended address of bintr. This accidentally worked on i386's because (0x90 << 24) is close enough to bintr, but it doesn't work on amd64's because (0x90 << 56) is much further away from bintr. - the FAKE_MCOUNT($btrap) in calltrap was similarly broken. It hasn't been needed since FreeBSD-1, so just delete it.
# 29ae923f	05-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
# 0d2a2989	17-Nov-2003	Peter Wemm <peter@FreeBSD.org>	Initial landing of SMP support for FreeBSD/amd64. - This is heavily derived from John Baldwin's apic/pci cleanup on i386. - I have completely rewritten or drastically cleaned up some other parts. (in particular, bootstrap) - This is still a WIP. It seems that there are some highly bogus bioses on nVidia nForce3-150 boards. I can't stress how broken these boards are. I have a workaround in mind, but right now the Asus SK8N is broken. The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed. - Most of my testing has been with SCHED_ULE. SCHED_4BSD works. - the apic and acpi components are 'standard'. - If you have an nVidia nForce3-150 board, you are stuck with 'device atpic' in addition, because they somehow managed to forget to connect the 8254 timer to the apic, even though its in the same silicon! ARGH! This directly violates the ACPI spec.
# e46fe241	12-Nov-2003	Peter Wemm <peter@FreeBSD.org>	Stop pretending to support kernel profiling. The FAKE_MCOUNT() etc calls are just gradually getting more and more stale. At this point it would be better to start from scratch once prof_machdep.c is adapted.
# 19acc770	14-Oct-2003	Peter Wemm <peter@FreeBSD.org>	Pull the tier-2 card one last time and break the get/setcontext and sigreturn() ABI and the signal context on the stack. Make the trapframe (and its shadows in the ucontext and sigframe etc) 8 bytes larger in order to preserve 16 byte stack alignment for the following C code calls. I could have done some padding after the trapframe was saved, but some of the C code still expects an argument of 'struct trapframe'. Anyway, this gives me a spare field that can be used to store things like 'partial trapframe' status or something else in the future. The runtime impact is fairly small, except for threaded apps and things that decode contexts and the signal stack (eg: cvsup binary). Signal delivery isn't too badly affected because the kernel generates the sigframe that sigreturn uses after the handler has been called. The size of mcontext_t and struct sigframe hasn't changed. Only the last few fields (sc_eip etc) got moved a little and I eliminated a spare field. mc_len/sc_len did change location though so the sanity checks there will still trap it.
# e63f19e1	22-Sep-2003	Peter Wemm <peter@FreeBSD.org>	MFi386 rev 1.105 by jhb: fix comment typo
# 2c25d124	09-Sep-2003	Peter Wemm <peter@FreeBSD.org>	Clean up get/set_mcontext() and get/set_fpcontext(). These are operated on data structures on the kernel stack which are guaranteed to be 16 byte aligned by gcc, the amd64 ABI and __aligned(16). Ensire the tss_rsp0 initial stack pointer is 16 byte aligned in case sizeof(pcb) becomes odd at some point. This is convenient for the interrupt handler case because the ring crossing pushes cause the required odd alignment before the call to the C code. Have fast_syscall add an additional 8 bytes to ensure that the trapframe has the correct odd alignment for the call to C code. Note that there are no checks to make sure that the trapframe size is appropriate for this. This makes get/setfpcontext work properly (finally). You get a GPF in kernel mode if any of this is botched without the alignment fixup code that is apparently needed on i386.
# d85631c4	13-May-2003	Peter Wemm <peter@FreeBSD.org>	Add BASIC i386 binary support for the amd64 kernel. This is largely stolen from the ia64/ia32 code (indeed there was a repocopy), but I've redone the MD parts and added and fixed a few essential syscalls. It is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic) and p4. The ia64 code has not implemented signal delivery, so I had to do that. Before you say it, yes, this does need to go in a common place. But we're in a freeze at the moment and I didn't want to risk breaking ia64. I will sort this out after the freeze so that the common code is in a common place. On the AMD64 side, this required adding segment selector context switch support and some other support infrastructure. The %fs/%gs etc code is hairy because loading %gs will clobber the kernel's current MSR_GSBASE setting. The segment selectors are not used by the kernel, so they're only changed at context switch time or when changing modes. This still needs to be optimized. Approved by: re (amd64/* blanket)
# 0fe93e74	12-May-2003	Peter Wemm <peter@FreeBSD.org>	For the page fault handler, save %cr2 in the outer trap handler so that we do not have to run so long with interrupts disabled. This involved creating tf_addr in the trapframe. Reorganize the trap stubs so that they consistently reserve the stack space and initialize any missing bits. Approved by: re (amd64 stuff)
# bf1e8974	11-May-2003	Peter Wemm <peter@FreeBSD.org>	Give a %fs and %gs to userland. Use swapgs to obtain the kernel %GS.base value on entry and exit. This isn't as easy as it sounds because when we recursively trap or interrupt, we have to avoid duplicating the swapgs instruction or we end up back with the userland %gs. I implemented this by testing TF_CS to see if we're coming from supervisor mode already, and check for returning to supervisor. To avoid a race with interrupts in the brief period after beginning executing the handler and before the swapgs, convert all trap gates to interrupt gates, and reenable interrupts immediately after the swapgs. I am not happy with this. There are other possible ways to do this that should be investigated. (eg: storing the GS.base MSR value in the trapframe) Add some sysarch functions to let the userland code get to this. Approved by: re (blanket amd64/*)
# 4ce3e250	11-May-2003	Peter Wemm <peter@FreeBSD.org>	Since compiling natively, the compile environment has been less forgiving about silly typos. Use the correct comment sequences.
# 9c43b77f	07-May-2003	Peter Wemm <peter@FreeBSD.org>	Fix a preemption race. I was reenabling interrupts in the fast system call handler before it was safe. It was possible for to lose context and for something else to clobber the PCPU scratch variable. This moves the interrupt enable way too late, but its better safe than sorry for the moment.
# 5b27e934	02-May-2003	Peter Wemm <peter@FreeBSD.org>	Repocopy .s to .S
# afa88623	30-Apr-2003	Peter Wemm <peter@FreeBSD.org>	Commit MD parts of a loosely functional AMD64 port. This is based on a heavily stripped down FreeBSD/i386 (brutally stripped down actually) to attempt to get a stable base to start from. There is a lot missing still. Worth noting: - The kernel runs at 1GB in order to cheat with the pmap code. pmap uses a variation of the PAE code in order to avoid having to worry about 4 levels of page tables yet. - It boots in 64 bit "long mode" with a tiny trampoline embedded in the i386 loader. This simplifies locore.s greatly. - There are still quite a few fragments of i386-specific code that have not been translated yet, and some that I cheated and wrote dumb C versions of (bcopy etc). - It has both int 0x80 for syscalls (but using registers for argument passing, as is native on the amd64 ABI), and the 'syscall' instruction for syscalls. int 0x80 preserves all registers, 'syscall' does not. - I have tried to minimize looking at the NetBSD code, except in a couple of places (eg: to find which register they use to replace the trashed %rcx register in the syscall instruction). As a result, there is not a lot of similarity. I did look at NetBSD a few times while debugging to get some ideas about what I might have done wrong in my first attempt.
# 4a338afd	17-Feb-2003	Julian Elischer <julian@FreeBSD.org>	Move a bunch of flags from the KSE to the thread. I was in two minds as to where to put them in the first case.. I should have listenned to the other mind. Submitted by: parts by davidxu@ Reviewed by: jeff@ mini@
# 6f8132a8	31-Jan-2003	Julian Elischer <julian@FreeBSD.org>	Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.
# aff81a81	28-Jan-2003	Jake Burkholder <jake@FreeBSD.org>	Remove BDE_DEBUGGER. Discussed with: bde
# 0dbb100b	26-Jan-2003	David Xu <davidxu@FreeBSD.org>	Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian
# 8c132e9f	06-Nov-2002	David Xu <davidxu@FreeBSD.org>	1.Fix smp race between kernel vm86 BIOS calling and userland vm86 mode code, remove global variable in_vm86call, set vm86 calling flag in PCB flags. 2.Fix vm86 BIOS calling preempted problem by changing vm86_lock mutex type from MTX_DEF to MTX_SPIN. vm86pcb is not remembered in thread struct, when the thread calling vm86 BIOS is preempted by interrupt thread, and later switching back to the thread would cause incorrect context be loaded into CPU registers, this leads to kernel crash.
# 23efa1f8	27-Jul-2002	Peter Wemm <peter@FreeBSD.org>	Unwind the syscall_with_err_pushed tweak that jake did some time back. OK'ed by: jake
# 83587634	10-Jul-2002	Julian Elischer <julian@FreeBSD.org>	fix a comment and note a problem with XXXSMP
# 50b6a555	10-Jul-2002	Matthew Dillon <dillon@FreeBSD.org>	Remove the critmode sysctl - the new method for critical_enter/exit (already the default) is now the only method for i386. Remove the paraphanalia that supported critmode. Remove td_critnest, clean up the assembly, and clean up (mostly remove) the old junk from cpu_critical_enter() and cpu_critical_exit().
# f55348f0	09-Jul-2002	Julian Elischer <julian@FreeBSD.org>	Include all of isa/ipl.s into exception.s as there is now nothing left in ipl.s except doreti which really belongs in with the exceptions as it's just the other side of the same coin. Will remove ipl.s in a separate commit. Agreed by: several including bde@freebsd.org
# d74ac681	26-Mar-2002	Matthew Dillon <dillon@FreeBSD.org>	Compromise for critical()/cpu_critical() recommit. Cleanup the interrupt disablement assumptions in kern_fork.c by adding another API call, cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it from MI to MD. Temporarily move cpu_critical() from <arch>/include/cpufunc.h to <arch>/<arch>/critical.c (stage-2 will clean this up). Implement interrupt deferral for i386 that allows interrupts to remain enabled inside critical sections. This also fixes an IPI interlock bug, and requires uses of icu_lock to be enclosed in a true interrupt disablement. This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized, and will move cpu_critical() into its own header file(s) + other things. This commit may break non-i386 architectures in trivial ways. This should be temporary. Reviewed by: core Approved by: core
# 181df8c9	26-Feb-2002	Matthew Dillon <dillon@FreeBSD.org>	revert last commit temporarily due to whining on the lists.
# f96ad4c2	26-Feb-2002	Matthew Dillon <dillon@FreeBSD.org>	STAGE-1 of 3 commit - allow (but do not require) interrupts to remain enabled in critical sections and streamline critical_enter() and critical_exit(). This commit allows an architecture to leave interrupts enabled inside critical sections if it so wishes. Architectures that do not wish to do this are not effected by this change. This commit implements the feature for the I386 architecture and provides a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For now you can turn the sysctl on and off at any time in order to test the architectural changes or track down bugs. This commit is just the first stage. Some areas of the code, specifically the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will be cleaned up in the STAGE-2 commit when the critical_() functions are moved entirely into MD files. The following changes have been made: critical_enter() and critical_exit() for I386 now simply increment and decrement curthread->td_critnest. They no longer disable hard interrupts. When critical_exit() decrements the counter to 0 it effectively calls a routine to deal with whatever interrupts were deferred during the time the code was operating in a critical section. Other architectures are unaffected. * fork_exit() has been conditionalized to remove MD assumptions for the new code. Old code will still use the old MD assumptions in regards to hard interrupt disablement. In STAGE-2 this will be turned into a subroutine call into MD code rather then hardcoded in MI code. The new code places the burden of entering the critical section in the trampoline code where it belongs. * I386: interrupts are now enabled while we are in a critical section. The interrupt vector code has been adjusted to deal with the fact. If it detects that we are in a critical section it currently defers the interrupt by adding the appropriate bit to an interrupt mask. * In order to accomplish the deferral, icu_lock is required. This is i386-specific. Thus icu_lock can only be obtained by mainline i386 code while interrupts are hard disabled. This change has been made. * Because interrupts may or may not be hard disabled during a context switch, cpu_switch() can no longer simply assume that PSL_I will be in a consistent state. Therefore, it now saves and restores eflags. * FAST INTERRUPT PROVISION. Fast interrupts are currently deferred. The intention is to eventually allow them to operate either while we are in a critical section or, if we are able to restrict the use of sched_lock, while we are not holding the sched_lock. * ICU and APIC vector assembly for I386 cleaned up. The ICU code has been cleaned up to match the APIC code in regards to format and macro availability. Additionally, the code has been adjusted to deal with deferred interrupts. * Deferred interrupts use a per-cpu boolean int_pending, and masks ipending, spending, and fpending. Being per-cpu variables it is not currently necessary to lock; bus cycles modifying them. Note that the same mechanism will enable preemption to be incorporated as a true software interrupt without having to further hack up the critical nesting code. * Note: the old critical_enter() code in kern/kern_switch.c is currently #ifdef to be compatible with both the old and new methodology. In STAGE-2 it will be moved entirely to MD code. Performance issues: One of the purposes of this commit is to enhance critical section performance, specifically to greatly reduce bus overhead to allow the critical section code to be used to protect per-cpu caches. These caches, such as Jeff's slab allocator work, can potentially operate very quickly making the effective savings of the new critical section code's performance very significant. The second purpose of this commit is to allow architectures to enable certain interrupts while in a critical section. Specifically, the intention is to eventually allow certain FAST interrupts to operate rather then defer. The third purpose of this commit is to begin to clean up the critical_enter()/critical_exit()/cpu_critical_enter()/ cpu_critical_exit() API which currently has serious cross pollution in MI code (in fork_exit() and ast() for example). The fourth purpose of this commit is to provide a framework that allows kernel-preempting software interrupts to be implemented cleanly. This is currently used for two forward interrupts in I386. Other architectures will have the choice of using this infrastructure or building the functionality directly into critical_enter()/ critical_exit(). Finally, this commit is designed to greatly improve the flexibility of various architectures to manage critical section handling, software interrupts, preemption, and other highly integrated architecture-specific details.
# d2f22d70	10-Feb-2002	Bruce Evans <bde@FreeBSD.org>	Garbage-collect the "LOCORE" version of MPLOCKED.
# a0f831f5	24-Aug-2001	John Baldwin <jhb@FreeBSD.org>	Remove references to the old giant kernel lock in various comments.
# d072acf9	12-Aug-2001	Bruce Evans <bde@FreeBSD.org>	Removed he BPTTRAP() macro and its use. It was intended for restoring bug for bug compatibility to ddb trap handlers after fixing the debugger trap gates to be interrupt gates, but the fix was never committed. Now I want the fix to apply to ddb.
# 9d146ac5	12-Jul-2001	Peter Wemm <peter@FreeBSD.org>	Activate SSE/SIMD. This is the extra context switching support that we are required to do if we let user processes use the extra 128 bit registers etc. This is the base part of the diff I got from: http://www.issei.org/issei/FreeBSD/sse.html I believe this is by: Mr. SUZUKI Issei <issei@issei.org> SMP support apparently by: Takekazu KATO <kato@chino.it.okayama-u.ac.jp> Test code by: NAKAMURA Kazushi <kaz@kobe1995.net>, see http://kobe1995.net/~kaz/FreeBSD/SSE.en.html I have fixed a couple of style(9) deviations. I have some followup commits to fix a couple of non-style things.
# 1c1771cb	22-May-2001	Bruce Evans <bde@FreeBSD.org>	Convert npx interrupts into traps instead of vice versa. This is much simpler for npx exceptions that start as traps (no assembly required...) and works better for npx exceptions that start as interrupts (there is no longer a problem for nested interrupts). Submitted by: original (pre-SMPng) version by luoqi
# 8bd57f8f	15-May-2001	John Baldwin <jhb@FreeBSD.org>	Remove unneeded includes of sys/ipl.h and machine/ipl.h.
# 02318dac	24-Feb-2001	Jake Burkholder <jake@FreeBSD.org>	Remove the leading underscore from all symbols defined in x86 asm and used in C or vice versa. The elf compiler uses the same names for both. Remove asnames.h with great prejudice; it has served its purpose. Note that this does not affect the ability to generate an aout kernel due to gcc's -mno-underscores option. moral support from: peter, jhb
# 631d7bf3	24-Feb-2001	Jake Burkholder <jake@FreeBSD.org>	- Rename the lcall system call handler from Xsyscall to Xlcall_syscall to be more like Xint0x80_syscall and less like c function syscall(). - Reduce code duplication between the int0x80 and lcall handlers by shuffling the elfags into the right place, saving the sizeof the instruction in tf_err and jumping into the common int0x80 code. Reviewed by: peter
# d888fc4e	11-Feb-2001	Mark Murray <markm@FreeBSD.org>	RIP <machine/lock.h>. Some things needed bits of <i386/include/lock.h> - cy.c now has its own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK() has been moved to <i386/include/apic.h> (AKA <machine/apic.h>). Reviewed by: jhb
# 142ba5f3	09-Feb-2001	John Baldwin <jhb@FreeBSD.org>	- Make astpending and need_resched process attributes rather than CPU attributes. This is needed for AST's to be properly posted in a preemptive kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and PS_NEEDRESCHED. They are still accesssed by their old macros: aston(), astoff(), etc. For completeness, an astpending() macro has been added to check for a pending AST, and clear_resched() has been added to clear need_resched(). - Rename syscall2() on the x86 back to syscall() to be consistent with other architectures.
# 2a36ec35	24-Jan-2001	John Baldwin <jhb@FreeBSD.org>	- Change fork_exit() to take a pointer to a trapframe as its 3rd argument instead of a trapframe directly. (Requested by bde.) - Convert the alpha switch_trampoline to call fork_exit() and use the MI fork_return() instead of child_return(). - Axe child_return().
# 89cb8fc9	24-Jan-2001	John Baldwin <jhb@FreeBSD.org>	Call fork_exit() now instead of futzing around in assembly during a fork return.
# a448b62a	21-Jan-2001	Jake Burkholder <jake@FreeBSD.org>	Make intr_nesting_level per-process, rather than per-cpu. Setup interrupt threads to run with it always >= 1, so that malloc can detect M_WAITOK from "interrupt" context. This is also necessary in order to context switch from sched_ithd() directly. Reviewed By: peter
# 87dce368	19-Jan-2001	Jake Burkholder <jake@FreeBSD.org>	Simplify the i386 asm MTX_{ENTER,EXIT} macros to just call the appropriate function, rather than doing a horse-and-buggy acquire. They now take the mutex type as an arg and can be used with sleep as well as spin mutexes.
# c1ef8aac	19-Jan-2001	Jake Burkholder <jake@FreeBSD.org>	- Make npx_intr INTR_MPSAFE and move acquiring Giant into the function itself. - Remove a hack to allow acquiring Giant from the npx asm trap vector.
# 558226ea	19-Jan-2001	Peter Wemm <peter@FreeBSD.org>	Use #ifdef DEV_NPX from opt_npx.h instead of #if NNPX > 0 from npx.h
# 41ed17bf	06-Jan-2001	Jake Burkholder <jake@FreeBSD.org>	Use %fs to access per-cpu variables in uni-processor kernels the same as multi-processor kernels. The old way made it difficult for kernel modules to be portable between uni-processor and multi-processor kernels. It is no longer necessary to jump through hoops. - always load %fs with the private segment on entry to the kernel - change the type of the self referntial pointer from struct privatespace to struct globaldata - make the globaldata symbol have value 0 in all cases, so the symbols in globals.s are always offsets, not aliases for fields in globaldata - define the globaldata space used for uniprocessor kernels in C, rather than assembler - change the assmebly language accessors to use %fs, add a macro PCPU_ADDR(member, reg), which loads the register reg with the address of the per-cpu variable member
# 6d43764a	13-Dec-2000	Jake Burkholder <jake@FreeBSD.org>	Introduce a new potientially cleaner interface for accessing per-cpu variables from i386 assembly language. The syntax is PCPU(member) where member is the capitalized name of the per-cpu variable, without the gd_ prefix. Example: movl %eax,PCPU(CURPROC). The capitalization is due to using the offsets generated by genassym rather than the symbols provided by linking with globals.o. asmacros.h is the wrong place for this but it seemed as good a place as any for now. The old implementation in asnames.h has not been removed because it is still used to de-mangle the symbols used by the C variables for the UP case.
# 21ad98bc	30-Nov-2000	Jake Burkholder <jake@FreeBSD.org>	Change doreti to take a trapframe instead of an intrframe. Remove associated pushes of dummy units to convert frame. Reviewed by: jhb
# 4ae465d0	14-Nov-2000	John Baldwin <jhb@FreeBSD.org>	Always enable interrupts during fork_trampoline() after releasing the sched_lock. This is needed for kernel threads that are created before interrupts are enabled. kthreads created by kld's that are created at SI_SUB_KLD such as the random kthread. Tested by: phk
# 03cea59c	05-Oct-2000	John Baldwin <jhb@FreeBSD.org>	Remove an unnecessary sti and spl0() in fork_trampoline. Interrupts should be enabled by MTX_EXIT() now when it releases the sched_lock.
# 1931cf94	05-Oct-2000	John Baldwin <jhb@FreeBSD.org>	- Heavyweight interrupt threads on the alpha for device I/O interrupts. - Make softinterrupts (SWI's) almost completely MI, and divorce them completely from the x86 hardware interrupt code. - The ihandlers array is now gone. Instead, there is a MI shandlers array that just contains SWI handlers. - Most of the former machine/ipl.h files have moved to a new sys/ipl.h. - Stub out all the spl*() functions on all architectures. Submitted by: dfr
# 0384fff8	06-Sep-2000	Jason Evans <jasone@FreeBSD.org>	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
# fc81cf82	09-May-2000	David E. O'Brien <obrien@FreeBSD.org>	1. `movl' is for use with 32-bit operands. Do NOT use it with 16-bit operands. `movw' could be used, but instead let the assembler decide the right instruction to use. 2. AT&T asm syntax requires a leading '*' in front of the operand for indirect calls and jumps.
# bd5caafc	28-Mar-2000	Matthew Dillon <dillon@FreeBSD.org>	The SMP cleanup commit broke need_resched, this fixes that and also removed unncessary MPLOCKED and 'lock' prefixes from the interrupt nesting level, since (A) the MP lock is held at the time, and (B) since the neting level is restored prior to return any interrupted code will see a consistent value.
# 36e9f877	28-Mar-2000	Matthew Dillon <dillon@FreeBSD.org>	Commit major SMP cleanups and move the BGL (big giant lock) in the syscall path inward. A system call may select whether it needs the MP lock or not (the default being that it does need it). A great deal of conditional SMP code for various deadended experiments has been removed. 'cil' and 'cml' have been removed entirely, and the locking around the cpl has been removed. The conditional separately-locked fast-interrupt code has been removed, meaning that interrupts must hold the CPL now (but they pretty much had to anyway). Another reason for doing this is that the original separate-lock for interrupts just doesn't apply to the interrupt thread mechanism being contemplated. Modifications to the cpl may now ONLY occur while holding the MP lock. For example, if an otherwise MP safe syscall needs to mess with the cpl, it must hold the MP lock for the duration and must (as usual) save/restore the cpl in a nested fashion. This is precursor work for the real meat coming later: avoiding having to hold the MP lock for common syscalls and I/O's and interrupt threads. It is expected that the spl mechanisms and new interrupt threading mechanisms will be able to run in tandem, allowing a slow piecemeal transition to occur. This patch should result in a moderate performance improvement due to the considerable amount of code that has been removed from the critical path, especially the simplification of the spl*() calls. The real performance gains will come later. Approved by: jkh Reviewed by: current, bde (exception.s) Some work taken from: luoqi's patch
# b812525f	25-Nov-1999	Julian Elischer <julian@FreeBSD.org>	Fix out-of-date comment
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# eec2e836	10-Jul-1999	Bruce Evans <bde@FreeBSD.org>	Go back to the old (icu.s rev.1.7 1993) way of keeping the AST-pending bit separate from ipending, since this is simpler and/or necessary for SMP and may even be better for UP. Reviewed by: alc, luoqi, tegge
# fd56d8b7	27-Jun-1999	Alan Cox <alc@FreeBSD.org>	An SMP-specific change: Remove an unnecessary lock acquire and release from every system call. (Storing a 32-bit constant is inherently atomic.) Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
# eb9d435a	01-Jun-1999	Jonathan Lemon <jlemon@FreeBSD.org>	Unifdef VM86. Reviewed by: silence on on -current
# ea2b3e3d	06-May-1999	Bruce Evans <bde@FreeBSD.org>	Fixed profiling of elf kernels. Made high resolution profiling compile for elf kernels (it is broken for all kernels due to lack of egcs support). Renaming of many assembler labels is avoided by declaring by declaring the labels that need to be visible to gprof as having type "function" and depending on the elf version of gprof being zealous about discarding the others. A few type declarations are still missing, mainly for SMP. PR: 9413 Submitted by: Assar Westerlund <assar@sics.se> (initial parts)
# 5206bca1	27-Apr-1999	Luoqi Chen <luoqi@FreeBSD.org>	Enable vmspace sharing on SMP. Major changes are, - %fs register is added to trapframe and saved/restored upon kernel entry/exit. - Per-cpu pages are no longer mapped at the same virtual address. - Each cpu now has a separate gdt selector table. A new segment selector is added to point to per-cpu pages, per-cpu global variables are now accessed through this new selector (%fs). The selectors in gdt table are rearranged for cache line optimization. - fask_vfork is now on as default for both UP and SMP. - Some aio code cleanup. Reviewed by: Alan Cox <alc@cs.rice.edu> John Dyson <dyson@iquest.net> Julian Elischer <julian@whistel.com> Bruce Evans <bde@zeta.org.au> David Greenman <dg@root.com>
# 6182fdbd	16-Apr-1999	Peter Wemm <peter@FreeBSD.org>	Bring the 'new-bus' to the i386. This extensively changes the way the i386 platform boots, it is no longer ISA-centric, and is fully dynamic. Most old drivers compile and run without modification via 'compatability shims' to enable a smoother transition. eisa, isapnp and pccard* are not yet using the new resource manager. Once fully converted, all drivers will be loadable, including PCI and ISA. (Some other changes appear to have snuck in, including a port of Soren's ATA driver to the Alpha. Soren, back this out if you need to.) This is a checkpoint of work-in-progress, but is quite functional. The bulk of the work was done over the last few years by Doug Rabson and Garrett Wollman. Approved by: core
# e7ba67f2	28-Feb-1999	Bruce Evans <bde@FreeBSD.org>	Removed all traces of `p_switchtime'. The relevant timestamp is per-cpu, not per-process. Keep it in `switchtime' consistently. It is now clear that the timestamp is always valid in fork_trampoline() except when the child is running on a previously idle cpu, which can only happen if there are multiple cpus, so don't check or set the timestamp in fork_trampoline except in the (i386) SMP case. Just remove the alpha code for setting it unconditionally, since there is no SMP case for alpha and the code had rotted. Parts reviewed by: dfr, phk
# 1b0b259e	25-Feb-1999	Bruce Evans <bde@FreeBSD.org>	Don't forget to update `switchticks' in corner cases (except for the alpha fork_trampoline(), forget it because it I believe it is only necessary for the unsupported SMP case).
# 0c61bd3a	10-Aug-1998	Bruce Evans <bde@FreeBSD.org>	Fixed restoring of cpl after trap handling. The wrong cpl (SWI_AST_MASK instead of 0) was "restored" after handling a trap that occurred while returning to user mode. This bug was most noticeable for VM86 and is still detected and fixed up (on return from the next exception) in doreti if VM86 is configured.
# e539380e	28-Jul-1998	Bruce Evans <bde@FreeBSD.org>	Set p->p_switchtime to switchtime instead of to the current time in fork_trampoline() if switchtime is valid. This fixes not accounting for the time between the previous context switch and and the current time (when the forked child starts up here) in most cases - the time is now counted in the child's runtime. I think it actually fixes all cases, and switchtime is always valid here, since there must have been a context switch just before the forked child starts up. Some code should be removed if this is correct. The check that switchtime is valid sometimes gives a false negative because the check isn't correct until the after the first context switch after the system has been up for >= 1 second.
# e796e00d	28-May-1998	Poul-Henning Kamp <phk@FreeBSD.org>	Some cleanups related to timecounters and weird ifdefs in <sys/time.h>. Clean up (or if antipodic: down) some of the msgbuf stuff. Use an inline function rather than a macro for timecounter delta. Maintain process "on-cpu" time as 64 bits of microseconds to avoid needless second rollover overhead. Avoid calling microuptime the second time in mi_switch() if we do not pass through _idle in cpu_switch() This should reduce our context-switch overhead a bit, in particular on pre-P5 and SMP systems. WARNING: Programs which muck about with struct proc in userland will have to be fixed. Reviewed, but found imperfect by: bde
# c21410e1	17-May-1998	Poul-Henning Kamp <phk@FreeBSD.org>	s/nanoruntime/nanouptime/g s/microruntime/microuptime/g Reviewed by: bde
# dc733423	17-Apr-1998	Dag-Erling Smørgrav <des@FreeBSD.org>	Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.
# dcfe0058	15-Apr-1998	Bruce Evans <bde@FreeBSD.org>	Fixed breakage of fork accounting in previous commit. A fork benchmark reported about 15 times as much sys time as real time. getmicroruntime() is confusing name.
# 00af9731	04-Apr-1998	Poul-Henning Kamp <phk@FreeBSD.org>	Time changes mark 2: * Figure out UTC relative to boottime. Four new functions provide time relative to boottime. * move "runtime" into struct proc. This helps fix the calcru() problem in SMP. * kill mono_time. * add timespec{add\|sub\|cmp} macros to time.h. (XXX: These may change!) * nanosleep, select & poll takes long sleeps one day at a time Reviewed by: bde Tested by: ache and others
# 640c4313	23-Mar-1998	Jonathan Lemon <jlemon@FreeBSD.org>	Add the ability to make real-mode BIOS calls from the kernel. Currently, everything is contained inside #ifdef VM86, so this option must be present in the config file to use this functionality. Thanks to Tor Egge, these changes should work on SMP machines. However, it may not be throughly SMP-safe. Currently, the only BIOS calls made are memory-sizing routines at bootup, these replace reading the RTC values.
# b3c0d232	27-Oct-1997	Bruce Evans <bde@FreeBSD.org>	Oops, <machine/psl.h> is used unconditionally in -current.
# ba21341d	27-Oct-1997	Bruce Evans <bde@FreeBSD.org>	Cleaned up #includes. Ifdefed conditionally used includes. Finished changing indentation of per-statement comments to 40.
# 98823b23	10-Oct-1997	Peter Wemm <peter@FreeBSD.org>	Convert the VM86 option from a global option to an option only depended on by the files that use it. Changing the VM86 option now only causes a recompile of a dozen files or so rather than the entire kernel.
# 20233f27	07-Sep-1997	Steve Passe <fsmp@FreeBSD.org>	General cleanup of the lock pushdown code. They are grouped and enabled from machine/smptests.h: #define PUSHDOWN_LEVEL_1 #define PUSHDOWN_LEVEL_2 #define PUSHDOWN_LEVEL_3 #define PUSHDOWN_LEVEL_4_NOT
# 78292efe	30-Aug-1997	Steve Passe <fsmp@FreeBSD.org>	Another round of lock pushdown. Add a simplelock to deal with disable_intr()/enable_intr() as used in UP kernel. UP kernel expects that this is enough to guarantee exclusive access to regions of code bracketed by these 2 functions. Add a simplelock to bracket clock accesses in clock.c: clock_lock. Help from: Bruce Evans <bde@zeta.org.au>
# 5f642c16	29-Aug-1997	Steve Passe <fsmp@FreeBSD.org>	Support for the new FAST_HI algorithm. Improved interrupt handling, fewer silo overflows. With help from: dave adkins <adkin003@gold.tc.umn.edu>
# 886e7896	23-Aug-1997	Steve Passe <fsmp@FreeBSD.org>	The last of the encapsolation of cpl/spl/ipending things into a critical region protected by the simplelock 'cpl_lock'. Notes: - this code is currently controlled on a section by section basis with defines in machine/param.h. All sections are currently enabled. - this code is not as clean as I would like, but that can wait till later. - the "giant lock" still surrounds most instances of this "cpl region". I still have to do the code that arbitrates setting cpl between the top and bottom halves of the kernel. - the possibility of deadlock exists, I am committing the code at this point so as to exercise it and detect any such cases B4 the "giant lock" is removed.
# 4a73d99f	20-Aug-1997	Steve Passe <fsmp@FreeBSD.org>	Made PEND_INTS default. Made NEW_STRATEGY default. Removed misc. old cruft. Centralized simple locks into mp_machdep.c Centralized simple lock macros into param.h More cleanup in the direction of making splxx()/cpl MP-safe.
# 7b185ef8	19-Aug-1997	Steve Passe <fsmp@FreeBSD.org>	Preperation for moving cpl into critical region access. Several new fine-grained locks. New FAST_INTR() methods: - separate simplelock for FAST_INTR, no more giant lock. - FAST_INTR()s no longer checks ipending on way out of ISR. sio made MP-safe (I hope).
# a5e8237d	10-Aug-1997	Steve Passe <fsmp@FreeBSD.org>	Oops, fix breakage to UP kernel.
# e7802310	10-Aug-1997	Steve Passe <fsmp@FreeBSD.org>	Added trap specific lock calls: get_fpu_lock, etc. All resolve to the GIANT_LOCK at this time, it is purely a logical partitioning.
# 8a5da002	09-Aug-1997	Steve Passe <fsmp@FreeBSD.org>	Minor conditionalization of XXX_MPLOCK on PEND_INTS.
# 48a09cf2	08-Aug-1997	John Dyson <dyson@FreeBSD.org>	VM86 kernel support. Work done by BSDI, Jonathan Lemon <jlemon@americantv.com>, Mike Smith <msmith@gsoft.com.au>, Sean Eric Fagan <sef@kithrup.com>, and probably alot of others. Submitted by: Jnathan Lemon <jlemon@americantv.com>
# da9f0182	30-Jul-1997	Steve Passe <fsmp@FreeBSD.org>	Converted the TEST_LOPRIO code to default. Created mplock functions that save/restore NO registers. Minor cleanup.
# 812e4da7	23-Jul-1997	Steve Passe <fsmp@FreeBSD.org>	New simple_lock code in asm: - s_lock_init() - s_lock() - s_lock_try() - s_unlock() Created lock for IO APIC and apic_imen (SMP version of imen) - imen_lock Code to use imen_lock for access from apic_ipl.s and apic_vector.s. Moved this code outside of mp_lock. It seems to work!!!
# e31521c3	20-Jul-1997	Bruce Evans <bde@FreeBSD.org>	Removed unused #includes.
# dc514523	30-Jun-1997	Bruce Evans <bde@FreeBSD.org>	Un-inline a call to spl0(). It is not time critical, and was only inline because there was no non-inline spl0() to call.
# b3196e4b	22-Jun-1997	Peter Wemm <peter@FreeBSD.org>	Preliminary support for per-cpu data pages. This eliminates a lot of #ifdef SMP type code. Things like _curproc reside in a data page that is unique on each cpu, eliminating the expensive macros like: #define curproc (SMPcurproc[cpunumber()]) There are some unresolved bootstrap and address space sharing issues at present, but Steve is waiting on this for other work. There is still some strictly temporary code present that isn't exactly pretty. This is part of a larger change that has run into some bumps, this part is standalone so it should be safe. The temporary code goes away when the full idle cpu support is finished. Reviewed by: fsmp, dyson
# 5400ed3b	31-May-1997	Peter Wemm <peter@FreeBSD.org>	Include file updates.. <machine/spl.h> -> <machine/ipl.h>, add <machine/ipl.h> to those files that were depending on getting SWI_* implicitly via <machine/cpufunc.h>
# 288e2230	28-May-1997	Peter Wemm <peter@FreeBSD.org>	remove no longer needed opt_smp.h includes
# 694f6724	26-May-1997	Steve Passe <fsmp@FreeBSD.org>	Changed inclusion of isa/icu.s to isa/ipl.s. This is part of the breakup of UP/SMP specific INTerrupt code.
# 8359d00f	07-May-1997	Peter Wemm <peter@FreeBSD.org>	forgotten comment
# 477a642c	26-Apr-1997	Peter Wemm <peter@FreeBSD.org>	Man the liferafts! Here comes the long awaited SMP -> -current merge! There are various options documented in i386/conf/LINT, there is more to come over the next few days. The kernel should run pretty much "as before" without the options to activate SMP mode. There are a handful of known "loose ends" that need to be fixed, but have been put off since the SMP kernel is in a moderately good condition at the moment. This commit is the result of the tinkering and testing over the last 14 months by many people. A special thanks to Steve Passe for implementing the APIC code!
# d12ee02d	13-Apr-1997	Bruce Evans <bde@FreeBSD.org>	Don't forget to set `runtime' in fork_trampoline(). The time slice before switching to a child for the first time was being counted twice. I think this only affected unimportant statistics. Simplified arg handling in fork_trampoline(). splz() doesn't actually smash the registers of interest.
# e2d87711	07-Apr-1997	Peter Wemm <peter@FreeBSD.org>	Lower the spl() of the new process from splhigh() right away, since nothing else will lower it until either much later, or never(?) for kernel processes. This basically re-fixes what Bruce fixed in rev 1.29 of kern_fork.c, which was broken again now the child does not execute back up the fork() calling tree.
# a2a1c95c	07-Apr-1997	Peter Wemm <peter@FreeBSD.org>	The biggie: Get rid of the UPAGES from the top of the per-process address space. (!) Have each process use the kernel stack and pcb in the kvm space. Since the stacks are at a different address, we cannot copy the stack at fork() and allow the child to return up through the function call tree to return to user mode - create a new execution context and have the new process begin executing from cpu_switch() and go to user mode directly. In theory this should speed up fork a bit. Context switch the tss_esp0 pointer in the common tss. This is a lot simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer to each process's tss since the esp0 pointer is a 32 bit pointer, and the sd_base setting is split into three different bit sections at non-aligned boundaries and requires a lot of twiddling to reset. The 8K of memory at the top of the process space is now empty, and unmapped (and unmappable, it's higher than VM_MAXUSER_ADDRESS). Simplity the pmap code to manage process contexts, we no longer have to double map the UPAGES, this simplifies and should measuably speed up fork(). The following parts came from John Dyson: Set PG_G on the UPAGES that are now in kernel context, and invalidate them when swapping them out. Move the upages object (upobj) from the vmspace to the proc structure. Now that the UPAGES (pcb and kernel stack) are out of user space, make rfork(..RFMEM..) do what was intended by sharing the vmspace entirely via reference counting rather than simply inheriting the mappings.
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 11282a57	11-Aug-1996	David Greenman <dg@FreeBSD.org>	Add support for i686 machine check trap.
# ee323f62	30-May-1996	Peter Wemm <peter@FreeBSD.org>	Jump some hoops to have the .s code being able to be run through both an ansi and traditional cpp. The nesting rules of macros are different, which required some changes. Use __CONCAT(x,y) instead of //. Redo some comments to use / */ rather than "# comment" because the ansi cpp cares about those, and also cares about quote matching.
# a8c5fef5	02-May-1996	Poul-Henning Kamp <phk@FreeBSD.org>	KGDB is dead. It may come back one day if somebody does it.
# 22a31706	11-Apr-1996	Poul-Henning Kamp <phk@FreeBSD.org>	Make alltraps a .globl so that DDB doesn't make people belive they have an ALIGNFLT on their hands all the time.
# d66a5066	02-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Mega-commit for Linux emulator update.. This has been stress tested under netscape-2.0 for Linux running all the Java stuff. The scrollbars are now working, at least on my machine. (whew! :-) I'm uncomfortable with the size of this commit, but it's too inter-dependant to easily seperate out. The main changes: COMPAT_LINUX is GONE. Most of the code has been moved out of the i386 machine dependent section into the linux emulator itself. The int 0x80 syscall code was almost identical to the lcall 7,0 code and a minor tweak allows them to both be used with the same C code. All kernels can now just modload the lkm and it'll DTRT without having to rebuild the kernel first. Like IBCS2, you can statically compile it in with "options LINUX". A pile of new syscalls implemented, including getdents(), llseek(), readv(), writev(), msync(), personality(). The Linux-ELF libraries want to use some of these. linux_select() now obeys Linux semantics, ie: returns the time remaining of the timeout value rather than leaving it the original value. Quite a few bugs removed, including incorrect arguments being used in syscalls.. eg: mixups between passing the sigset as an int, vs passing it as a pointer and doing a copyin(), missing return values, unhandled cases, SIOC* ioctls, etc. The build for the code has changed. i386/conf/files now knows how to build linux_genassym and generate linux_assym.h on the fly. Supporting changes elsewhere in the kernel: The user-mode signal trampoline has moved from the U area to immediately below the top of the stack (below PS_STRINGS). This allows the different binary emulations to have their own signal trampoline code (which gets rid of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so that the emulator can provide the exact "struct sigcontext *" argument to the program's signal handlers. The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which have the same values as the re-used SA_DISABLE and SA_ONSTACK which are intended for sigaction only. This enables the support of a SA_RESETHAND flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal semantics where the signal handler is reset when it's triggered. makesyscalls.sh no longer appends the struct sysentvec on the end of the generated init_sysent.c code. It's a lot saner to have it in a seperate file rather than trying to update the structure inside the awk script. :-) At exec time, the dozen bytes or so of signal trampoline code are copied to the top of the user's stack, rather than obtaining the trampoline code the old way by getting a clone of the parent's user area. This allows Linux and native binaries to freely exec each other without getting trampolines mixed up.
# 32831552	21-Dec-1995	David Greenman <dg@FreeBSD.org>	Rewrote most of the ddb stack traceback code. These changes are smarter about decoding trap/syscall/interrupt frames and generally works better than the previous stuff. Removed some special (incorrect) frobbing of the frame pointer that was messing some things up with the new traceback code.
# 2838c968	19-Dec-1995	David Greenman <dg@FreeBSD.org>	Implemented a (sorely needed for years) double fault handler to catch stack overflows. It sure would be nice if there was an unmapped page between the PCB and the stack (and that the size of the stack was configurable!). With the way things are now, the PCB will get clobbered before the double fault handler gets control, making somewhat of a mess of things. Despite this, it is still fairly easy to poke around in the overflowed stack to figure out the cause.
# b1529bda	14-Dec-1995	Peter Wemm <peter@FreeBSD.org>	GENERIC/LINT: Remove redundant quoting on some option lines. LINT: add a couple of new/missing/undocumented options files.i386: add linux code so that you can compile a kernel with static linux emulation ("options LINUX") i386/*: use #if defined(COMPAT_LINUX) \|\| defined(LINUX) to enable static support of linux emulation (just like "IBCS2" makes ibcs2 static) The main thing this is going to make obvious, is that the LINUX code (when compiled from LINT) has a lot of warnings, some of which dont look too pleasant..
# 0e1815bb	07-Sep-1995	David Greenman <dg@FreeBSD.org>	Minor cleanup and (very) small micro optimization to Xsyscall (and the linux one)..
# a722b904	15-Aug-1995	Bruce Evans <bde@FreeBSD.org>	Fake a call frame for traps so that `gdb -k' can report where fatal traps occurred. This also helps ddb backtrace through trap frames. Backtracing through syscall and interrupt frames still doesn't work but it is relatively unimportant and more expensive to fix.
# d3628763	11-Jun-1995	Rodney W. Grimes <rgrimes@FreeBSD.org>	Merge RELENG_2_0_5 into HEAD
# 1e1e0b44	14-Feb-1995	Søren Schmidt <sos@FreeBSD.org>	First attempt to run linux binaries. This is only the changes needed to the generic kernel. The actual emulator is a separate LKM. (not finished yet, sorry). Submitted by: sos@freebsd.org & sef@kithrup.com
# 20415301	14-Jan-1995	Bruce Evans <bde@FreeBSD.org>	Fix security holes in sigreturn(), ptrace() and procfs. sigreturn() attempted to check for insecure and fatal eflags and segment selectors, but missed many cases and got the IOPL check back to front. The other syscalls didn't check at all. sys_process.c, machdep.c: Only allow PT_WRITE_U to write to the registers (ordinary and FP). psl.h, locore.s, machdep.c: Eliminate PSL_MBZ, PSL_MBO and PSL_USERCLR. We are not supposed to assume anything about the reserved bits. Use PSL_USERCHANGE and PSL_KERNEL instead. Rename PSL_USERSET to PSL_USER. exception.s: Define a private label for use by doreti when returning to user mode fails. machdep.c: In syscalls, allow changing only the eflags that can be changed on 486's in user mode (no longer attempt to allow benign IOPL changes; allow changing the nasty PSL_NT; don't allow changing the i586 bits). Don't attempt to check all the cases involving invalid selectors and %eip's. Just check for privilege violations and let the invalid things cause a trap. procfs_machdep.c: Call the ptrace register functions to do all the work for reading and writing ordinary registers and for single stepping. trap.c: Ignore traps caused by PSL_NT being set. Previously, users could cause a fatal trap in user mode by setting PSL_NT and executing an iret, and a fatal trap in kernel mode by setting PSL_NT and making a syscall. PSL_NT was cleared too late and not in enough modes to fix the problem. Make all traps in user mode (except T_NMI) nonfatal. Recover from traps caused by attempting to load invalid user registers in doreti by restarting the traps so that they appear to occur in user mode. --- Fix bogons that I noticed while fixing the above: psl.h: Fix some comments. Uniformize idempotency ifdef. exception.s, machdep.c: Remove rsvd[0-14]. rsvd0 hasn't been reserved since the 486 came out. Replace rsvd0 by `align'. rsvd[0-11] used wrong (magic non-unique) trap numbers. Replace rsvd[1-14] by rsvd. locore.s: Enable alignment check flag on 486's and 586's. machdep.c: Use a better type for kstack[]. Use TFREGP() to find the registers. Reformat ptrace functions from SEF to something closer to KNF. procfs_machdep.c: The wrong pointer to the registers got fixed as a side effect. Implement reading and writing of FP registers. /proc//regs now work (only) for processes that are in memory. Clean up comments. trap.c, trap.h: Remove unused trap types.
# b39b673d	03-Dec-1994	Bruce Evans <bde@FreeBSD.org>	i386/exception.s, Keep track of interrupt nesting level. It is normally 0 for syscalls and traps, but is fudged to 1 for their exit processing in case they metamorphose into an interrupt handler. i386/genassym.c; Remove support for the obsolete pcb_iml and pcb_cmap2. Add support for pcb_inl. i386/swtch.s: Fudge the interrupt nesting level across context switches and in the idle loop so that the work for preemptive context switches gets counted as interrupt time, the work for voluntary context switches gets counted mostly as system time (the part when curproc == 0 gets counted as interrupt time), and only truly idle time gets counted as idle time. Remove obsolete support (commented out and otherwise) for pcb_iml. Load curpcb just before curproc instead of just after so that curpcb is always valid if curproc is. A few more changes like this may fix tracing through context switches. Remove obsolete function swtch_to_inactive(). include/cpu.h: Use the new interrupt nesting level variable to implement a non-fake CLF_INTR() so that accounting for the interrupt state works. You can use top, iostat or (best) an up to date systat to see interrupt overheads. I see the expected huge interrupt overheads for ISA devices (on a 486DX/33, about 55% for an IDE drive transferring 1250K/sec and the same for a WD8013EBT network card transferring 1100K/sec). The huge interrupt overheads for serial devices are unfortunately normally invisible. include/pcb.h: Remove the obsolete pcb_iml and pcb_cmap2. Replace them by padding to preserve binary compatibility. Use part of the new padding for pcb_inl. isa/icu.s: isa/vector.s: Keep track of interrupt nesting level.
# e04520ce	27-Sep-1994	Bruce Evans <bde@FreeBSD.org>	Ensure normal selection and alignment of the text and data sections before including files. vector.s sometimes left the data section misaligned (depending on the configuration) so all the time-critical globals in icu.s were sometimes misaligned.
# f540b106	12-Aug-1994	Garrett Wollman <wollman@FreeBSD.org>	Change all #includes to follow the current Berkeley style. Some of these ``changes'' are actually not changes at all, but CVS sometimes has trouble telling the difference. This also includes support for second-directory compiles. This is not quite complete yet, as `config' doesn't yet do the right thing. You can still make it work trivially, however, by doing the following: rm /sys/compile mkdir /usr/obj/sys/compile ln -s M-. /sys/compile cd /sys/i386/conf config MYKERNEL cd ../../compile/MYKERNEL ln -s /sys @ rm machine ln -s @/i386/include machine make depend make
# d2306226	02-Apr-1994	David Greenman <dg@FreeBSD.org>	New interrupt code from Bruce Evans. In additional to Bruce's attached list of changes, I've made the following additional changes: 1) i386/include/ipl.h renamed to spl.h as the name conflicts with the file of the same name in i386/isa/ipl.h. 2) changed all use of mask (i.e. netmask, biomask, ttymask, etc) to _imask (net_imask, etc). 3) changed vestige of splnet use in if_is to splimp. 4) got rid of "impmask" completely (Bruce had gotten rid of netmask), and are now using net_imask instead. 5) dozens of minor cruft to glue in Bruce's changes. These require changes I made to config(8) as well, and thus it must be rebuilt. -DG from Bruce Evans: sio: o No diff is supplied. Remove the define of setsofttty(). I hope that is enough. .s: o i386/isa/debug.h no longer exists. The event counters became too much trouble to maintain. All function call entry and exception entry counters can be recovered by using profiling kernel (the new profiling supports all entry points; however, it is too slow to leave enabled all the time; it also). Only BDBTRAP() from debug.h is now used. That is moved to exception.s. It might be worth preserving SHOW_BITS() and calling it from _mcount() (if enabled). o T_ASTFLT is now only set just before calling trap(). o All exception handlers set SWI_AST_MASK in cpl as soon as possible after entry and arrange for _doreti to restore it atomically with exiting. It is not possible to set it atomically with entering the kernel, so it must be checked against the user mode bits in the trap frame before committing to using it. There is no place to store the old value of cpl for syscalls or traps, so there are some complications restoring it. Profiling stuff (mostly in .s): o Changes to kern/subr_mcount.c, gcc and gprof are not supplied yet. o All interesting labels `foo' are renamed `_foo' and all uninteresting labels `_bar' are renamed `bar'. A small change to gprof allows ignoring labels not starting with underscores. o MCOUNT_LABEL() is to provide names for counters for times spent in exception handlers. o FAKE_MCOUNT() is a version of MCOUNT() suitable for exception handlers. Its arg is the pc where the exception occurred. The new mcount() pretends that this was a call from that pc to a suitable MCOUNT_LABEL(). o MEXITCOUNT is to turn off any timer started by MCOUNT(). /usr/src/sys/i386/i386/exception.s: o The non-BDB BPTTRAP() macros were doing a sti even when interrupts were disabled when the trap occurred. The sti (fixed) sti is actually a no-op unless you have my changes to machdep.c that make the debugger trap gates interrupt gates, but fixing that would make the ifdefs messier. ddb seems to be unharmed by both interrupts always disabled and always enabled (I had the branch in the fix back to front for some time :-(). o There is no known pushal bug. o tf_err can be left as garbage for syscalls. /usr/src/sys/i386/i386/locore.s: o Fix and update BDE_DEBUGGER support. o ENTRY(btext) before initialization was dangerous. o Warm boot shot was longer than intended. /usr/src/sys/i386/i386/machdep.c: o DON'T APPLY ALL OF THIS DIFF. It's what I'm using, but may require other changes. Use the following: o Remove aston() and setsoftclock(). Maybe use the following: o No netisr.h. o Spelling fix. o Delay to read the Rebooting message. o Fix for vm system unmapping a reduced area of memory after bounds_check_with_label() reduces the size of a physical i/o for a partition boundary. A similar fix is required in kern_physio.c. o Correct use of __CONCAT. It never worked here for non- ANSI cpp's. Is it time to drop support for non-ANSI? o gdt_segs init. 0xffffffffUL is bogus because ssd_limit is not 32 bits. The replacement may have the same value :-), but is more natural. o physmem was one page too low. Confusing variable names. Don't use the following: o Better numbers of buffers. Each 8K page requires up to 16 buffer headers. On my system, this results in 5576 buffers containing [up to] 2854912 bytes of memory. The usual allocation of about 384 buffers only holds 192K of disk if you use it on an fs with a block size of 512. o gdt changes for bdb. o TGT -> IDT changes for bdb. o #ifdefed changes for bdb. /usr/src/sys/i386/i386/microtime.s: o Use the correct asm macros. I think asm.h was copied from Mach just for microtime and isn't used now. It certainly doesn't belong in <sys>. Various macros are also duplicated in sys/i386/boot.h and libc/i386/.h. o Don't switch to and from the IRR; it is guaranteed to be selected (default after ICU init and explicitly selected in isa.c too, and never changed until the old microtime clobbered it). /usr/src/sys/i386/i386/support.s: o Non-essential changes (none related to spls or profiling). o Removed slow loads of %gs again. The LDT support may require not relying on %gs, but loading it is not the way to fix it! Some places (copyin ...) forgot to load it. Loading it clobbers the user %gs. trap() still loads it after certain types of faults so that fuword() etc can rely on it without loading it explicitly. Exception handlers don't restore it. If we want to preserve the user %gs, then the fastest method is to not touch it except for context switches. Comparing with VM_MAXUSER_ADDRESS and branching takes only 2 or 4 cycles on a 486, while loading %gs takes 9 cycles and using it takes another. o Fixed a signed branch to unsigned. /usr/src/sys/i386/i386/swtch.s: o Move spl0() outside of idle loop. o Remove cli/sti from idle loop. sw1 does a cli, and in the unlikely event of an interrupt occurring and whichqs becoming zero, sw1 will just jump back to _idle. o There's no spl0() function in asm any more, so use splz(). o swtch() doesn't need to be superaligned, at least with the new mcounting. o Fixed a signed branch to unsigned. o Removed astoff(). /usr/src/sys/i386/i386/trap.c: o The decentralized extern decls were inconsistent, of course. o Fixed typo MATH_EMULTATE in comments. / o Removed unused variables. o Old netmask is now impmask; print it instead. Perhaps we should print some of the new masks. o BTW, trap() should not print anything for normal debugger traps. /usr/src/sys/i386/include/asmacros.h: o DON'T APPLY ALL OF THIS DIFF. Just use some of the null macros as necessary. /usr/src/sys/i386/include/cpu.h: o CLKF_BASEPRI() changes since cpl == SWI_AST_MASK is now normal while the kernel is running. o Don't use var++ to set boolean variables. It fails after a mere 4G times :-) and is slower than storing a constant on [3-4]86s. /usr/src/sys/i386/include/cpufunc.h: o DON'T APPLY ALL OF THIS DIFF. You need mainly the include of <machine/ipl.h>. Unfortunately, <machine/ipl.h> is needed by almost everything for the inlines. /usr/src/sys/i386/include/ipl.h: o New file. Defines spl inlines and SWI macros and declares most variables related to hard and soft interrupt masks. /usr/src/sys/i386/isa/icu.h: o Moved definitions to <machine/ipl.h> /usr/src/sys/i386/isa/icu.s: o Software interrupts (SWIs) and delayed hardware interrupts (HWIs) are now handled uniformally, and dispatching them from splx() is more like dispatching them from _doreti. The dispatcher is essentially (handler[ffs(ipending & ~cpl)](). o More care (not quite enough) is taken to avoid unbounded nesting of interrupts. o The interface to softclock() is changed so that a trap frame is not required. o Fast interrupt handlers are now handled more uniformally. Configuration is still too early (new handlers would require bits in <machine/ipl.h> and functions to vector.s). o splnnn() and splx() are no longer here; they are inline functions (could be macros for other compilers). splz() is the nontrivial part of the old splx(). /usr/src/sys/i386/isa/ipl.h o New file. Supposed to have only bus-dependent stuff. Perhaps the h/w masks should be declared here. /usr/src/sys/i386/isa/isa.c: o DON'T APPLY ALL OF THIS DIFF. You need only things involving mask and MASK and comments about them. netmask is now a pure software mask. It works like the softclock mask. /usr/src/sys/i386/isa/vector.s: o Reorganize AUTO_EOI macros. o Option FAST_INTR_HANDLER_USERS_ES for people who don't trust fastintr handlers. o fastintr handlers need to metamorphose into ordinary interrupt handlers if their SWI bit has become set. Previously, sio had unintended latency for handling output completions and input of SLIP framing characters because this was not done. /usr/src/sys/net/netisr.h: o The machine-dependent stuff is now imported from <machine/ipl.h>. /usr/src/sys/sys/systm.h o DON'T APPLY ALL OF THIS DIFF. You need mainly the different splx() prototype. The spl*() prototypes are duplicated as inlines in <machine/ipl.h> but they need to be duplicated here in case there are no inlines. I sent systm.h and cpufunc.h to Garrett. We agree that spl0 should be replaced by splnone and not the other way around like I've done. /usr/src/sys/kern/kern_clock.c o splsoftclock() now lowers cpl so the direct call to softclock() works as intended. o softclock() interface changed to avoid passing the whole frame (some machines may need another change for profile_tick()). o profiling renamed _profiling to avoid ANSI namespace pollution. (I had to improve the mcount() interface and may as well fix it.) The GUPROF variant doesn't actually reference profiling here, but the 'U' in GUPROF should mean to select the microtimer mcount() and not change the interface.
# c8a13ecd	03-Jan-1994	David Greenman <dg@FreeBSD.org>	Convert syscall to trapframe. Based on work done by John Brezak.
# 0967373e	12-Nov-1993	David Greenman <dg@FreeBSD.org>	First steps in rewriting locore.s, and making info useful when the machine panics. i386/i386/locore.s: 1) got rid of most .set directives that were being used like #define's, and replaced them with appropriate #define's in the appropriate header files (accessed via genassym). 2) added comments to header inclusions and global definitions, and global variables 3) replaced some hardcoded constants with cpp defines (such as PDESIZE and others) 4) aligned all comments to the same column to make them easier to read 5) moved macro definitions for ENTRY, ALIGN, NOP, etc. to /sys/i386/include/asmacros.h 6) added #ifdef BDE_DEBUGGER around all of Bruce's debugger code 7) added new global '_KERNend' to store last location+1 of kernel 8) cleaned up zeroing of bss so that only bss is zeroed 9) fix zeroing of page tables so that it really does zero them all - not just if they follow the bss. 10) rewrote page table initialization code so that 1) works correctly and 2) write protects the kernel text by default 11) properly initialize the kernel page directory, upages, p0stack PT, and page tables. The previous scheme was more than a bit screwy. 12) change allocation of virtual area of IO hole so that it is fixed at KERNBASE + 0xa0000. The previous scheme put it right after the kernel page tables and then later expected it to be at KERNBASE +0xa0000 13) change multiple bogus settings of user read/write of various areas of kernel VM - including the IO hole; we should never be accessing the IO hole in user mode through the kernel page tables 14) split kernel support routines such as bcopy, bzero, copyin, copyout, etc. into a seperate file 'support.s' 15) split swtch and related routines into a seperate 'swtch.s' 16) split routines related to traps, syscalls, and interrupts into a seperate file 'exception.s' 17) remove some unused global variables from locore that got inserted by Garrett when he pulled them out of some .h files. i386/isa/icu.s: 1) clean up global variable declarations 2) move in declaration of astpending and netisr i386/i386/pmap.c: 1) fix calculation of virtual_avail. It previously was calculated to be right in the middle of the kernel page tables - not a good place to start allocating kernel VM. 2) properly allocate kernel page dir/tables etc out of kernel map - previously only took out 2 pages. i386/i386/machdep.c: 1) modify boot() to print a warning that the system will reboot in PANIC_REBOOT_WAIT_TIME amount of seconds, and let the user abort with a key on the console. The machine will wait for ever if a key is typed before the reboot. The default is 15 seconds, but can be set to 0 to mean don't wait at all, -1 to mean wait forever, or any positive value to wait for that many seconds. 2) print "Rebooting..." just before doing it. kern/subr_prf.c: 1) remove PANICWAIT as it is deprecated by the change to machdep.c i386/i386/trap.c: 1) add table of trap type strings and use it to print a real trap/ panic message rather than just a number. Lot's of work to be done here, but this is the first step. Symbolic traceback is in the TODO. i386/i386/Makefile.i386: 1) add support in to build support.s, exception.s and swtch.s ...and various changes to various header files to make all of the above happen.