History log of /freebsd-11-stable/sys/i386/i386/apic_vector.s
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 329462 17-Feb-2018 kib

MFC r328083,328096,328116,328119,328120,328128,328135,328153,328157,
328166,328177,328199,328202,328205,328468,328470,328624,328625,328627,
328628,329214,329297,329365:

Meltdown mitigation by PTI, PCID optimization of PTI, and kernel use of IBRS
for some mitigations of Spectre.

Tested by: emaste, Arshan Khanifar <arshankhanifar@gmail.com>
Discussed with: jkim
Sponsored by: The FreeBSD Foundation


# 324299 05-Oct-2017 kib

MFC r324079:
Do not return from interrupt using the POP_FRAME;iret instruction
sequence, always jump to doreti.


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 298022 15-Apr-2016 sephe

hyperv: Deprecate HYPERV option by moving Hyper-V IDT vector into vmbus

Submitted by: Jun Su <junsu microsoft com>
Reviewed by: jhb, kib, sephe
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5910


# 282274 30-Apr-2015 jhb

Remove support for Xen PV domU kernels. Support for HVM domU kernels
remains. Xen is planning to phase out support for PV upstream since it
is harder to maintain and has more overhead. Modern x86 CPUs include
virtualization extensions that support HVM guests instead of PV guests.
In addition, the PV code was i386 only and not as well maintained recently
as the HVM code.
- Remove the i386-only NATIVE option that was used to disable certain
components for PV kernels. These components are now standard as they
are on amd64.
- Remove !XENHVM bits from PV drivers.
- Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3,
etc.)
- Remove duplicate copy of <xen/features.h>.
- Remove unused, i386-only xenstored.h.

Differential Revision: https://reviews.freebsd.org/D2362
Reviewed by: royger
Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0)
Relnotes: yes


# 282212 29-Apr-2015 whu

Microsoft vmbus, storage and other related driver enhancements for HyperV.
- Vmbus multi channel support.
- Vector interrupt support.
- Signal optimization.
- Storvsc driver performance improvement.
- Scatter and gather support for storvsc driver.
- Minor bug fix for KVP driver.
Thanks royger, jhb and delphij from FreeBSD community for the reviews
and comments. Also thanks Hovy Xu from NetApp for the contributions to
the storvsc driver.

PR: 195238
Submitted by: whu
Reviewed by: royger, jhb, delphij
Approved by: royger
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Microsoft OSTC


# 281707 18-Apr-2015 kib

Remove lazy pmap switch code from i386. Naive benchmark with md(4)
shows no difference with the code removed.

On both amd64 and i386, assert that a released pmap is not active.

Proposed and reviewed by: alc
Discussed with: Svatopluk Kraus <onwahe@gmail.com>, peter
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 278473 09-Feb-2015 kib

Add x2APIC support. Enable it by default if CPU is capable. The
hw.x2apic_enable tunable allows disabling it from the loader prompt.

To closely repeat effects of the uncached memory ops when accessing
registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded
by mfence, except for the EOI notifications. This is probably too
strict, only ICR writes to send IPI require serialization to ensure
that other CPUs see the previous actions when IPI is delivered. This
may be changed later.

In vmm justreturn IPI handler, call doreti_iret instead of doing iretd
inline, to handle corner conditions.

Note that the patch only switches LAPICs into x2APIC mode. It does not
enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC
MADT entries and doing interrupts remapping, but is the required step
on the way.

Reviewed by: neel
Tested by: pho (real hardware), neel (on bhyve)
Discussed with: jhb, grehan
Sponsored by: The FreeBSD Foundation
MFC after: 2 months


# 263001 11-Mar-2014 royger

amd64/i386: switch IPI handlers to C code.

Move asm IPIs handlers to C code, so both Xen and native IPI handlers
share the same code.

Reviewed by: jhb
Approved by: gibbs
Sponsored by: Citrix Systems R&D

amd64/amd64/apic_vector.S:
i386/i386/apic_vector.s:
- Remove asm coded IPI handlers and instead call the newly introduced
C variants.

amd64/amd64/mp_machdep.c:
i386/i386/mp_machdep.c:
- Add C coded clones to the asm IPI handlers (moved from
x86/xen/hvm.c).

i386/include/smp.h:
amd64/include/smp.h:
- Add prototypes for the C IPI handlers.

x86/xen/hvm.c:
- Move the C IPI handlers to mp_machdep and call those in the Xen IPI
handlers.

i386/xen/mp_machdep.c:
- Add dummy IPI handlers to the i386 Xen PV port (this port doesn't
support SMP).


# 255040 29-Aug-2013 gibbs

Implement vector callback for PVHVM and unify event channel implementations

Re-structure Xen HVM support so that:
- Xen is detected and hypercalls can be performed very
early in system startup.
- Xen interrupt services are implemented using FreeBSD's native
interrupt delivery infrastructure.
- the Xen interrupt service implementation is shared between PV
and HVM guests.
- Xen interrupt handlers can optionally use a filter handler
in order to avoid the overhead of dispatch to an interrupt
thread.
- interrupt load can be distributed among all available CPUs.
- the overhead of accessing the emulated local and I/O apics
on HVM is removed for event channel port events.
- a similar optimization can eventually, and fairly easily,
be used to optimize MSI.

Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure,
and misc Xen cleanups:

Sponsored by: Spectra Logic Corporation

Unification of PV & HVM interrupt infrastructure, bug fixes,
and misc Xen cleanups:

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D

sys/x86/x86/local_apic.c:
sys/amd64/include/apicvar.h:
sys/i386/include/apicvar.h:
sys/amd64/amd64/apic_vector.S:
sys/i386/i386/apic_vector.s:
sys/amd64/amd64/machdep.c:
sys/i386/i386/machdep.c:
sys/i386/xen/exception.s:
sys/x86/include/segments.h:
Reserve IDT vector 0x93 for the Xen event channel upcall
interrupt handler. On Hypervisors that support the direct
vector callback feature, we can request that this vector be
called directly by an injected HVM interrupt event, instead
of a simulated PCI interrupt on the Xen platform PCI device.
This avoids all of the overhead of dealing with the emulated
I/O APIC and local APIC. It also means that the Hypervisor
can inject these events on any CPU, allowing upcalls for
different ports to be handled in parallel.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
Map Xen per-vcpu area during AP startup.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
Increase the FreeBSD IRQ vector table to include space
for event channel interrupt sources.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
Remove Xen HVM per-cpu variable data. These fields are now
allocated via the dynamic per-cpu scheme. See xen_intr.c
for details.

sys/amd64/include/xen/hypercall.h:
sys/dev/xen/blkback/blkback.c:
sys/i386/include/xen/xenvar.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/xen/gnttab.c:
Prefer FreeBSD primatives to Linux ones in Xen support code.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
sys/dev/xen/balloon/balloon.c:
sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/console/xencons_ring.c:
sys/dev/xen/control/control.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/xenpci/xenpci.c:
sys/i386/i386/machdep.c:
sys/i386/include/pmap.h:
sys/i386/include/xen/xenfunc.h:
sys/i386/isa/npx.c:
sys/i386/xen/clock.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/mptable.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/xen_rtc.c:
sys/xen/evtchn/evtchn_dev.c:
sys/xen/features.c:
sys/xen/gnttab.c:
sys/xen/gnttab.h:
sys/xen/hvm.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstore.c:
sys/xen/xenstore/xenstore_dev.c:
sys/xen/xenstore/xenstorevar.h:
Pull common Xen OS support functions/settings into xen/xen-os.h.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
Remove constants, macros, and functions unused in FreeBSD's Xen
support.

sys/xen/xen-os.h:
sys/i386/xen/xen_machdep.c:
sys/x86/xen/hvm.c:
Introduce new functions xen_domain(), xen_pv_domain(), and
xen_hvm_domain(). These are used in favor of #ifdefs so that
FreeBSD can dynamically detect and adapt to the presence of
a hypervisor. The goal is to have an HVM optimized GENERIC,
but more is necessary before this is possible.

sys/amd64/amd64/machdep.c:
sys/dev/xen/xenpci/xenpcivar.h:
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/sys/kernel.h:
Refactor magic ioport, Hypercall table and Hypervisor shared
information page setup, and move it to a dedicated HVM support
module.

HVM mode initialization is now triggered during the
SI_SUB_HYPERVISOR phase of system startup. This currently
occurs just after the kernel VM is fully setup which is
just enough infrastructure to allow the hypercall table
and shared info page to be properly mapped.

sys/xen/hvm.h:
sys/x86/xen/hvm.c:
Add definitions and a method for configuring Hypervisor event
delievery via a direct vector callback.

sys/amd64/include/xen/xen-os.h:
sys/x86/xen/hvm.c:

sys/conf/files:
sys/conf/files.amd64:
sys/conf/files.i386:
Adjust kernel build to reflect the refactoring of early
Xen startup code and Xen interrupt services.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
sys/dev/xen/control/control.c:
sys/dev/xen/evtchn/evtchn_dev.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/xen/xenstore/xenstore.c:
sys/xen/evtchn/evtchn_dev.c:
sys/dev/xen/console/console.c:
sys/dev/xen/console/xencons_ring.c
Adjust drivers to use new xen_intr_*() API.

sys/dev/xen/blkback/blkback.c:
Since blkback defers all event handling to a taskqueue,
convert this task queue to a "fast" taskqueue, and schedule
it via an interrupt filter. This avoids an unnecessary
ithread context switch.

sys/xen/xenstore/xenstore.c:
The xenstore driver is MPSAFE. Indicate as much when
registering its interrupt handler.

sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusvar.h:
Remove unused event channel APIs.

sys/xen/evtchn.h:
Remove all kernel Xen interrupt service API definitions
from this file. It is now only used for structure and
ioctl definitions related to the event channel userland
device driver.

Update the definitions in this file to match those from
NetBSD. Implementing this interface will be necessary for
Dom0 support.

sys/xen/evtchn/evtchnvar.h:
Add a header file for implemenation internal APIs related
to managing event channels event delivery. This is used
to allow, for example, the event channel userland device
driver to access low-level routines that typical kernel
consumers of event channel services should never access.

sys/xen/interface/event_channel.h:
sys/xen/xen_intr.h:
Standardize on the evtchn_port_t type for referring to
an event channel port id. In order to prevent low-level
event channel APIs from leaking to kernel consumers who
should not have access to this data, the type is defined
twice: Once in the Xen provided event_channel.h, and again
in xen/xen_intr.h. The double declaration is protected by
__XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared
twice within a given compilation unit.

sys/xen/xen_intr.h:
sys/xen/evtchn/evtchn.c:
sys/x86/xen/xen_intr.c:
sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/xenpci/xenpcivar.h:
New implementation of Xen interrupt services. This is
similar in many respects to the i386 PV implementation with
the exception that events for bound to event channel ports
(i.e. not IPI, virtual IRQ, or physical IRQ) are further
optimized to avoid mask/unmask operations that aren't
necessary for these edge triggered events.

Stubs exist for supporting physical IRQ binding, but will
need additional work before this implementation can be
fully shared between PV and HVM.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
sys/i386/xen/mp_machdep.c
sys/x86/xen/hvm.c:
Add support for placing vcpu_info into an arbritary memory
page instead of using HYPERVISOR_shared_info->vcpu_info.
This allows the creation of domains with more than 32 vcpus.

sys/i386/i386/machdep.c:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/exception.s:
Add support for new event channle implementation.


# 235683 20-May-2012 iwasaki

Remove cpususpend IDT vector for XEN.
This broke XEN kernel building.


# 235622 18-May-2012 iwasaki

Add SMP/i386 suspend/resume support.
Most part is merged from amd64.

- i386/acpica/acpi_wakecode.S
Replaced with amd64 code (from realmode to paging enabling code).

- i386/acpica/acpi_wakeup.c
Replaced with amd64 code (except for wakeup_pagetables stuff).

- i386/include/pcb.h
- i386/i386/genassym.c
Added PCB new members (CR0, CR2, CR4, DS, ED, FS, SS, GDT, IDT, LDT
and TR) needed for suspend/resume, not for context switch.

- i386/i386/swtch.s
Added suspendctx() and resumectx().
Note that savectx() was not changed and used for suspending (while
amd64 code uses it).
BSP and AP execute the same sequence, suspendctx(), acpi_wakecode()
and resumectx() for suspend/resume (in case of UP system also).

- i386/i386/apic_vector.s
Added cpususpend().

- i386/i386/mp_machdep.c
- i386/include/smp.h
Added cpususpend_handler().

- i386/include/apicvar.h
- kern/subr_smp.c
- sys/smp.h
Added IPI_SUSPEND and suspend_cpus().

- i386/i386/initcpu.c
- i386/i386/machdep.c
- i386/include/md_var.h
- pc98/pc98/machdep.c
Moved initializecpu() declarations to md_var.h.

MFC after: 3 days


# 217360 13-Jan-2011 jhb

If an interrupt on an I/O APIC is moved to a different CPU after it has
started to execute, it seems that the corresponding ISR bit in the "old"
local APIC can be cleared. This causes the local APIC interrupt routine
to fail to find an interrupt to service. Rather than panic'ing in this
case, simply return from the interrupt without sending an EOI to the
local APIC. If there are any other pending interrupts in other ISR
registers, the local APIC will assert a new interrupt.

Tested by: steve


# 214631 01-Nov-2010 jhb

Move <machine/apicreg.h> to <x86/apicreg.h>.


# 209483 23-Jun-2010 kib

Clear DF bit in eflags/rflags on the kernel entry. The i386 and amd64
ABI specifies the DF should be zero, and newer compilers do not clear
DF before using DF-sensitive instructions.

The DF clearing for signal handlers was done some time ago.

MFC after: 1 week


# 208507 24-May-2010 jhb

Add support for corrected machine check interrupts. CMCI is a new local
APIC interrupt that fires when a threshold of corrected machine check
events is reached. CMCI also includes a count of events when reporting
corrected errors in the bank's status register. Note that individual
banks may or may not support CMCI. If they do, each bank includes its own
threshold register that determines when the interrupt fires. Currently
the code uses a very simple strategy where it doubles the threshold on
each interrupt until it succeeds in throttling the interrupt to occur
only once a minute (this interval can be tuned via sysctl). The threshold
is also adjusted on each hourly poll which will lower the threshold once
events stop occurring.

Tested by: Sailaja Bangaru sbappana at yahoo com
MFC after: 1 month


# 205851 29-Mar-2010 jhb

Add a handler for the local APIC error interrupt. For now it just prints
out the current value of the local APIC error register when the interrupt
fires.

MFC after: 1 week


# 184113 21-Oct-2008 kmacy

don't globally define ipi_bitmap_handler on xen


# 165302 17-Dec-2006 kmacy

Evidently FreeBSD has long relied on the compiler to treat structures
passed by value (trap frames) as if they were in fact being passed by
reference. For better or worse, this incorrect behaviour is no longer
present in gcc 4.1. In this patch I convert all trapframe arguments to
be explicitly pass by reference. I also remove vm86_initflags, pushing
the very little work that it actually does up into vm86_prepcall.

Reviewed by: kan
Tested by: kan


# 158236 01-May-2006 jhb

Add a new 'pmap_invalidate_cache()' to flush the CPU caches via the
wbinvd() instruction. This includes a new IPI so that all CPU caches on
all CPUs are flushed for the SMP case.

MFC after: 1 month


# 153426 14-Dec-2005 jhb

Fix stale comment.


# 153146 05-Dec-2005 jhb

Change the i386 code to pass the interrupt vector as a separate argument
rather than embedding it in the intrframe as if_vec. This reduces diffs
with amd64 somewhat.
- Remove cf_vec from clockframe (it wasn't used anyway) and stop pushing
dummy vector arguments for ipi_bitmap_handler() and lapic_handle_timer()
since clockframe == trapframe now.
- Fix ddb to handle stack traces across interrupt entry points that just
have a trapframe on their stack and not a trapframe + vector.
- Change intr_execute_handlers() to take a trapframe rather than an
intrframe pointer.
- Change lapic_handle_intr() and atpic_handle_intr() to take a vector and
trapframe rather than an intrframe.
- GC struct intrframe now that nothing uses it anymore.
- GC CLOCK_TO_TRAPFRAME() and INTR_TO_TRAPFRAME().

Reviewed by: bde
Requested by: peter


# 153141 05-Dec-2005 jhb

- Move the code to deal with handling an IPI_STOP IPI out of
ipi_nmi_handler() and into a new cpustop_handler() function. Change
the Xcpustop IPI_STOP handler to call this function instead of
duplicating all the same logic in assembly.
- EOI the local APIC for the lapic timer interrupt in C rather than
assembly.
- Bump the lazypmap IPI counter if COUNT_IPIS is defined in C rather than
assembly.


# 153135 05-Dec-2005 jhb

- Move PUSH_FRAME and POP_FRAME into machine/asmacros.h.
- Add a new SET_KERNEL_SREGS macro that sets up %ds and %es to point to
kernel data and %fs to point to per-CPU data and use the new macro
in several kernel entry points including trap and interrupt handlers.
- Convert the IPI_STOP handler Xcpustop to push a standard trap frame
rather than an application frame.
- Make the TRAP() macro private to exception.s since it is only used
there.
- Move the PCPU_*() macros in asmacros.h out of the middle of the
profiling macros.

Reviewed by: bde
Requested by: bde (4, 5)


# 152701 22-Nov-2005 jhb

Make COUNT_IPIS and COUNT_XINVLTLB_HITS real kernel options and take
them out of machine/smptests.h.


# 152529 16-Nov-2005 jhb

Revert previous commit to these files. There isn't a race necessitating
an xchg instruction as we only try to execute the startup function if
the CPU ID is 0 (i.e. the BSP). I missed this earlier.


# 151632 24-Oct-2005 jhb

Use xchg in Xcpustop to close a race and make cpustop_restartfunc truly
one-shot in the SMP case (before using the simple mov / cmp / mov sequence
could allow multiple CPUs to execute the restart function on resume).


# 150697 28-Sep-2005 jhb

Add interrupt counters for IPIs. By default they are disabled, but they
can be enabled by enabling COUNT_IPIS in smptests.h. When enabled, each
CPU provides an interrupt counter for nearly all of the IPIs it receives
(IPI_STOP currently doesn't have a counter) that can be examined using
vmstat -i, etc.

MFC after: 3 days
Requested by: rwatson


# 141538 08-Feb-2005 jhb

Use the local APIC timer to drive the various kernel clocks on SMP machines
rather than forwarding interrupts from the clock devices around using IPIs:
- Add an IDT vector that pushes a clock frame and calls
lapic_handle_timer().
- Add functions to program the local APIC timer including setting the
divisor, and setting up the timer to either down a periodic countdown
or one-shot countdown.
- Add a lapic_setup_clock() function that the BSP calls from
cpu_init_clocks() to setup the local APIC timer if it is going to be
used. The setup uses a one-shot countdown to calibrate the timer. We
then program the timer on each CPU to fire at a frequency of hz * 3.
stathz is defined as freq / 23 (hz * 3 / 23), and profhz is defined as
freq / 2 (hz * 3 / 2). This gives the clocks relatively prime divisors
while keeping a low LCM for the frequency of the clock interrupts.
Thanks to Peter Jeremy for suggesting this approach.
- Remove the hardclock and statclock forwarding code including the two
associated IPIs. The bitmap IPI handler has now effectively degenerated
to just IPI_AST.
- When the local APIC timer is used we don't turn the RTC on at all, but
we still enable interrupts on the ISA timer 0 (i8254) for timecounting
purposes.


# 138528 07-Dec-2004 ups

Avoid more than two pending IPI interrupt vectors per local APIC
as this may cause deadlocks.

This should fix kern/72123.

Discussed with: jhb
Tested by: Nik Azim Azam, Andy Farkas, Flack Man, Aykut KARA
Izzet BESKARDES, Jens Binnewies, Karl Keusgen
Approved by: sam (mentor)


# 129742 26-May-2004 bde

MFamd64:

Fixed profiling of trap, syscall and interrupt handlers and some
ordinary functions, essentially by backing out half of rev.1.106 of
i386/exception.s. The handlers must be between certain labels for
the purposes of profiling, and this was broken by scattering them in
separately compiled .s files, especially for ordinary functions that
ended up between the labels. Merge the files by #including them as
before, except with different pathnames and better comments and
organization. Changes to the scattered files are minimal -- just
move the labels to the file that does the #includes.

This also partly fixes profiling of IPIs -- all IPI handlers are now
correctly classified as interrupt handlers, but many are still missing
mcount calls.

vm86bios.s is included as before, but it is now between the labels for
interrupt handlers again, which seems to be wrong since half of it is
for a non-interrupt handler.


# 129620 23-May-2004 bde

MFamd64 (put TF_EIP in assym.s and use it instead of a magic offset in
FAKE_MCOUNT()s).


# 128328 16-Apr-2004 jhb

Use %eax rather than %ax when loading segment registers to avoid partial
register stalls.

Reviewed by: bde (a while ago, and I think an earlier version)


# 128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# 125405 03-Feb-2004 jhb

Revert the skipping of segment register reloads as it appears to actually
be a pessimization on non Pentium4 CPUs. More importantly, it is buggy as
it can cause GPF's when using APM or vm86.


# 125161 28-Jan-2004 jhb

Optimize the i386 interrupt entry code to not reload the segment registers
if they already contain the correct kernel selectors.

Reviewed by: peter
Suggested by: peter


# 123106 02-Dec-2003 bde

Fixed panics in apic interrupt handlers if kernel profiling is turned
on. MCOUNT and FAKE_MCOUNT() may clobber all the call-used registers,
and one FAKE_MCOUNT() was placed so that an active %eax was clobbered.
The fix is to move this FAKE_MCOUNT() earlier where it should have
been anyway.

Fixed 3 layers of bitrot in the comment about why this FAKE_MCOUNT()
was where it was by removing the comment. (mcount() should be called
as early as possible after entering a new level, but an implementation
detail got in the way until 3 layers of changes ago.)

Kernel profiling still gives wrong results because the new interrupt
code rearranged object files too much. mcount() depends on trap,
syscall and interrupt handlers being between certain magic labels with
interrupt handlers last, and on nothing else being there. Splitting
up exception.o moved the magic labels to effectively random places
relative to what they are supposed to delimit. This mainly broke the
call graph; the flat profile is still usable.


# 122690 14-Nov-2003 jhb

Shuffle the APIC interrupt vectors around a bit:
- Move the IPI and local APIC interrupt vectors up into the 0xf0 - 0xff
range. The pmap lazyfix IPI was reordered down next to the TLB
shootdowns to avoid conflicting with the spurious interrupt vector.
- Move the base of APIC interrupts up 16 so that the first 16 APIC
interrupts do not overlap the vectors used by the ATPIC.
- Remove bogus interrupt vector reservations for LINT[01].
- Now that 0xc0 - 0xef are available, use them for device interrupts.
This increases the number of APIC device interrupts to 191.
- Increase the system-wide number of global interrupts to 191 to catch up
to more APIC interrupts.

Requested by: peter (2)


# 122620 13-Nov-2003 jhb

Whitespace.


# 122572 12-Nov-2003 jhb

- Move manipulation of td_intr_nesting_level out of assembly interrupt
vector stubs and into the C functions they call.
- Move disabling and EOIing of interrupt sources out of PIC driver entry
points and into intr_execute_handlers(). Intr_execute_handlers() only
disables a source for an interrupt if it is a stray interrupt or has
threaded handlers. Sources with fast handlers no longer disable (mask)
the source while executing the handlers.
- Move the setting of clkintr_pending into intr_execute_handlers() and set
the variable for any interrupt source with a vector of 0. (Should only
be true for IRQ 0.) This fixes clkintr_pending in the NO_MIXED_MODE
case.
- Implement lapic_eoi() and use it to implement ioapic_eoi_source().
- Rename atpic_sched_ithd() to atpic_handle_intr() since it is used to
handle all atpic interrupts and not just threaded ones.

Inspired by: peter's changes to amd64 in p4 (1)
Requested by: bde (2)


# 121986 03-Nov-2003 jhb

New APIC support code:

- The apic interrupt entry points have been rewritten so that each entry
point can serve 32 different vectors. When the entry is executed, it
uses one of the 32-bit ISR registers to determine which vector in its
assigned range was triggered. Thus, the apic code can support 159
different interrupt vectors with only 5 entry points.
- We now always to disable the local APIC to work around an errata in
certain PPros and then re-enable it again if we decide to use the APICs
to route interrupts.
- We no longer map IO APICs or local APICs using special page table
entries. Instead, we just use pmap_mapdev(). We also no longer
export the virtual address of the local APIC as a global symbol to
the rest of the system, but only in local_apic.c. To aid this, the
APIC ID of each CPU is exported as a per-CPU variable.
- Interrupt sources are provided for each intpin on each IO APIC.
Currently, each source is given a unique interrupt vector meaning that
PCI interrupts are not shared on most machines with an I/O APIC.
That mapping for interrupt sources to interrupt vectors is up to the
APIC enumerator driver however.
- We no longer probe to see if we need to use mixed mode to route IRQ 0,
instead we always use mixed mode to route IRQ 0 for now. This can be
disabled via the 'NO_MIXED_MODE' kernel option.
- The npx(4) driver now always probes to see if a built-in FPU is present
since this test can now be performed with the new APIC code. However,
an SMP kernel will panic if there is more than one CPU and a built-in
FPU is not found.
- PCI interrupts are now properly routed when using APICs to route
interrupts, so remove the hack to psuedo-route interrupts when the
intpin register was read.
- The apic.h header was moved to apicreg.h and a new apicvar.h header
that declares the APIs used by the new APIC code was added.


# 121133 16-Oct-2003 bde

Don't forget to load %es with the kernel data segment selector in
Xcpustop(). %es is used in at least the call to savectx() when savectx()
calls bcopy(), so not loading it was fatal if a stop IPI interrupts
user mode.

This reduces bugs starting and stopping CPUs for debuggers. CPUs are
stopped mainly in kdb_trap() and cpu_reset(). At reset time there is
a good chance that all the CPUs are in the kernel, so the bug was
probably harmless then.


# 117372 09-Jul-2003 peter

unifdef -DLAZY_SWITCH and start to tidy up the associated glue.


# 115907 06-Jun-2003 jhb

- Use IDTVEC() to declare IPI handlers since they are also IDT vectors.
- Make handlers for IPI's used by SMP kernels #ifdef SMP.


# 112993 02-Apr-2003 peter

Commit a partial lazy thread switch mechanism for i386. it isn't as lazy
as it could be and can do with some more cleanup. Currently its under
options LAZY_SWITCH. What this does is avoid %cr3 reloads for short
context switches that do not involve another user process. ie: we can
take an interrupt, switch to a kthread and return to the user without
explicitly flushing the tlb. However, this isn't as exciting as it could
be, the interrupt overhead is still high and too much blocks on Giant
still. There are some debug sysctls, for stats and for an on/off switch.

The main problem with doing this has been "what if the process that you're
running on exits while we're borrowing its address space?" - in this case
we use an IPI to give it a kick when we're about to reclaim the pmap.

Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a
few more things and get some more feedback before turning it on by default.

This is NOT a replacement for Bosko's lazy interrupt stuff. This was more
meant for the kthread case, while his was for interrupts. Mine helps a
little for interrupts, but his helps a lot more.

The stats are enabled with options SWTCH_OPTIM_STATS - this has been a
pseudo-option for years, I just added a bunch of stuff to it.

One non-trivial change was to select a new thread before calling
cpu_switch() in the first place. This allows us to catch the silly
case of doing a cpu_switch() to the current process. This happens
uncomfortably often. This simplifies a bit of the asm code in cpu_switch
(no longer have to call choosethread() in the middle). This has been
implemented on i386 and (thanks to jake) sparc64. The others will come
soon. This is actually seperate to the lazy switch stuff.

Glanced at by: jake, jhb


# 110296 03-Feb-2003 jake

Split statclock into statclock and profclock, and made the method for driving
statclock based on profhz when profiling is enabled MD, since most platforms
don't use this anyway. This removes the need for statclock_process, whose
only purpose was to subdivide profhz, and gets the profiling clock running
outside of sched_lock on platforms that implement suswintr.
Also changed the interface for starting and stopping the profiling clock to
do just that, instead of changing the rate of statclock, since they can now
be separate.

Reviewed by: jhb, tmm
Tested on: i386, sparc64


# 102329 23-Aug-2002 peter

Ok, somebody please shoot me. The asm I wrote for the ranged IPI shootdown
was wrong. It only ever invalidated one page due to me getting the loop
terminator wrong. This explains the DISABLE_PG_G effect on SMP.


# 99862 12-Jul-2002 peter

Revive backed out pmap related changes from Feb 2002. The highlights are:
- It actually works this time, honest!
- Fine grained TLB shootdowns for SMP on i386. IPI's are very expensive,
so try and optimize things where possible.
- Introduce ranged shootdowns that can be done as a single IPI.
- PG_G support for i386
- Specific-cpu targeted shootdowns. For example, there is no sense in
globally purging the TLB cache for where we are stealing a page from
the local unshared process on the local cpu. Use pm_active to track
this.
- Add some instrumentation for the tlb shootdown code.
- Rip out SMP code from <machine/cpufunc.h>
- Try and fix some very bogus PG_G and PG_PS interactions that were bad
enough to cause vm86 bios calls to break. vm86 depended on our existing
bugs and this was the cause of the VESA panics last time.
- Fix the silly one-line error that caused the 'panic: bad pte' last time.
- Fix a couple of other silly one-line errors that should have caused more
pain than they did.

Some more work is needed:
- pmap_{zero,copy}_page[_idle]. These can be done without IPI's if we
have a hook in cpu_switch.
- The IPI handlers need some cleanup. I have a bogus %ds load that can
be avoided.
- APTD handling is rather bogus and appears to be a large source of
global TLB IPI shootdowns for no really good reason.

I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop.
I expect to see a bigger difference when there is significant pageout
activity or the system otherwise has memory shortages.

I have backed out a few optimizations that I had been using over the last
few days in order to be a little more conservative. I'll revisit these
again over the next few days as the dust settles.

New option: DISABLE_PG_G - In case I missed something.


# 99766 11-Jul-2002 peter

Bah, move the invltlb counter to C code and hook a debug sysctl onto it.


# 99765 11-Jul-2002 peter

s/NCPU/MAXCPU/ to try and get this to compile.


# 93607 01-Apr-2002 dillon

Stage-2 commit of the critical*() code. This re-inlines cpu_critical_enter()
and cpu_critical_exit() and moves associated critical prototypes into their
own header file, <arch>/<arch>/critical.h, which is only included by the
three MI source files that need it.

Backout and re-apply improperly comitted syntactical cleanups made to files
that were still under active development. Backout improperly comitted program
structure changes that moved localized declarations to the top of two
procedures. Partially re-apply one of the program structure changes to
move 'mask' into an intermediate block rather then in three separate
sub-blocks to make the code more readable. Re-integrate bug fixes that Jake
made to the sparc64 code.

Note: In general, developers should not gratuitously move declarations out
of sub-blocks. They are where they are for reasons of structure, grouping,
readability, compiler-localizability, and to avoid developer-introduced bugs
similar to several found in recent years in the VFS and VM code.

Reviewed by: jake


# 93264 27-Mar-2002 dillon

Compromise for critical*()/cpu_critical*() recommit. Cleanup the interrupt
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).

Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.

This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.

Reviewed by: core
Approved by: core


# 91367 27-Feb-2002 peter

Back out all the pmap related stuff I've touched over the last few days.
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels. Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely. Userland programs still crashed.


# 91328 26-Feb-2002 dillon

revert last commit temporarily due to whining on the lists.


# 91315 26-Feb-2002 dillon

STAGE-1 of 3 commit - allow (but do not require) interrupts to remain
enabled in critical sections and streamline critical_enter() and
critical_exit().

This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes. Architectures that do not wish to do
this are not effected by this change.

This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.

This commit is just the first stage. Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.

The following changes have been made:

* critical_enter() and critical_exit() for I386 now simply increment
and decrement curthread->td_critnest. They no longer disable
hard interrupts. When critical_exit() decrements the counter to
0 it effectively calls a routine to deal with whatever interrupts
were deferred during the time the code was operating in a critical
section.

Other architectures are unaffected.

* fork_exit() has been conditionalized to remove MD assumptions for
the new code. Old code will still use the old MD assumptions
in regards to hard interrupt disablement. In STAGE-2 this will
be turned into a subroutine call into MD code rather then hardcoded
in MI code.

The new code places the burden of entering the critical section
in the trampoline code where it belongs.

* I386: interrupts are now enabled while we are in a critical section.
The interrupt vector code has been adjusted to deal with the fact.
If it detects that we are in a critical section it currently defers
the interrupt by adding the appropriate bit to an interrupt mask.

* In order to accomplish the deferral, icu_lock is required. This
is i386-specific. Thus icu_lock can only be obtained by mainline
i386 code while interrupts are hard disabled. This change has been
made.

* Because interrupts may or may not be hard disabled during a
context switch, cpu_switch() can no longer simply assume that
PSL_I will be in a consistent state. Therefore, it now saves and
restores eflags.

* FAST INTERRUPT PROVISION. Fast interrupts are currently deferred.
The intention is to eventually allow them to operate either while
we are in a critical section or, if we are able to restrict the
use of sched_lock, while we are not holding the sched_lock.

* ICU and APIC vector assembly for I386 cleaned up. The ICU code
has been cleaned up to match the APIC code in regards to format
and macro availability. Additionally, the code has been adjusted
to deal with deferred interrupts.

* Deferred interrupts use a per-cpu boolean int_pending, and
masks ipending, spending, and fpending. Being per-cpu variables
it is not currently necessary to lock; bus cycles modifying them.

Note that the same mechanism will enable preemption to be
incorporated as a true software interrupt without having to
further hack up the critical nesting code.

* Note: the old critical_enter() code in kern/kern_switch.c is
currently #ifdef to be compatible with both the old and new
methodology. In STAGE-2 it will be moved entirely to MD code.

Performance issues:

One of the purposes of this commit is to enhance critical section
performance, specifically to greatly reduce bus overhead to allow
the critical section code to be used to protect per-cpu caches.
These caches, such as Jeff's slab allocator work, can potentially
operate very quickly making the effective savings of the new
critical section code's performance very significant.

The second purpose of this commit is to allow architectures to
enable certain interrupts while in a critical section. Specifically,
the intention is to eventually allow certain FAST interrupts to
operate rather then defer.

The third purpose of this commit is to begin to clean up the
critical_enter()/critical_exit()/cpu_critical_enter()/
cpu_critical_exit() API which currently has serious cross pollution
in MI code (in fork_exit() and ast() for example).

The fourth purpose of this commit is to provide a framework that
allows kernel-preempting software interrupts to be implemented
cleanly. This is currently used for two forward interrupts in I386.
Other architectures will have the choice of using this infrastructure
or building the functionality directly into critical_enter()/
critical_exit().

Finally, this commit is designed to greatly improve the flexibility
of various architectures to manage critical section handling,
software interrupts, preemption, and other highly integrated
architecture-specific details.


# 91260 25-Feb-2002 peter

Work-in-progress commit syncing up pmap cleanups that I have been working
on for a while:
- fine grained TLB shootdown for SMP on i386
- ranged TLB shootdowns.. eg: specify a range of pages to shoot down with
a single IPI, since the IPI is very expensive. Adjust some callers
that used to trigger this inside tight loops to do a ranged shootdown
at the end instead.
- PG_G support for SMP on i386 (options ENABLE_PG_G)
- defer PG_G activation till after we decide what we are going to do with
PSE and the 4MB pages at the start of the kernel. This should solve
some rumored strangeness about stale PG_G entries getting stuck
underneath the 4MB pages.
- add some instrumentation for the fine TLB shootdown
- convert some asm instruction wrappers from functions to inlines. gcc
seems to do a fair bit better with this.
- [temporarily!] pessimize the tlb shootdown IPI handlers. I will fix
this again shortly.

This has been working fairly well for me for a while, but I have tweaked
it again prior to commit since my last major testing round. The only
outstanding problem that I know of is PG_G related, which is why there
is an option for it (not on by default for SMP). I have seen a world
speedups by a few percent (as much as 4 or 5% in one case) but I have
*not* accurately measured this - I am a bit sceptical of these numbers.


# 88900 05-Jan-2002 jhb

Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe. Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer. This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs. Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called. (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha. I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine. PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken. Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by: peter
Tested on: i386, alpha


# 88322 20-Dec-2001 jhb

Introduce a standard name for the lock protecting an interrupt controller
and it's associated state variables: icu_lock with the name "icu". This
renames the imen_mtx for x86 SMP, but also uses the lock to protect
access to the 8259 PIC on x86 UP. This also adds an appropriate lock to
the various Alpha chipsets which fixes problems with Alpha SMP machines
dropping interrupts with an SMP kernel.


# 84733 09-Oct-2001 iedowse

Remove the Xresume* labels from the i386 interrupt handlers; the
code in ipl.s and icu_ipl.s that used them was removed when the
interrupt thread system was committed. Debuggers also knew about
Xresume* because these labels hide the real names of the interrupt
handlers (Xintr*), and debuggers need to special-case interrupt
handlers to get the interrupt frame.

Both gdb and ddb will now use the Xintr* and Xfastintr* symbols to
detect interrupt frames. Fast interrupt frames were never identified
correctly before, so this fixes the problem of the running stack
frame getting lost in a ddb or gdb trace generated from a fast
interrupt - e.g. when debugging a simple infinite loop in the kernel
using a serial console, the frame containing the loop would never
appear in a gdb or ddb trace.

Reviewed by: jhb, bde


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 76078 27-Apr-2001 jhb

Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by: jake, peter
Looked over by: eivind


# 75393 10-Apr-2001 jhb

Remove the BETTER_CLOCK #ifdef's. The code is on by default and is here
to stay for the foreseeable future.

OK'd by: peter (the idea)


# 73586 05-Mar-2001 jhb

Don't enable interrupts before calling sched_ithd for threaded interrupts.

Tested by: obrien


# 73011 25-Feb-2001 jake

Remove the leading underscore from all symbols defined in x86 asm
and used in C or vice versa. The elf compiler uses the same names
for both. Remove asnames.h with great prejudice; it has served its
purpose.

Note that this does not affect the ability to generate an aout kernel
due to gcc's -mno-underscores option.

moral support from: peter, jhb


# 72276 10-Feb-2001 jhb

- Make astpending and need_resched process attributes rather than CPU
attributes. This is needed for AST's to be properly posted in a preemptive
kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and
PS_NEEDRESCHED. They are still accesssed by their old macros:
aston(), astoff(), etc. For completeness, an astpending() macro has been
added to check for a pending AST, and clear_resched() has been added to
clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
other architectures.


# 72220 09-Feb-2001 jhb

Remove unused forward_irq counters.


# 71532 24-Jan-2001 jhb

- Remove all the #if 0'd code that used to implement IRQ forwarding.
- Remove #if 0'd lazy interrupt mask.


# 71337 21-Jan-2001 jake

Make intr_nesting_level per-process, rather than per-cpu. Setup
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context. This is also necessary
in order to context switch from sched_ithd() directly.

Reviewed By: peter


# 71321 21-Jan-2001 peter

Remove APIC_INTR_DIAGNOSTIC - this has been disabled for some time now.
Remove some leftovers of removed SMP options.


# 71245 19-Jan-2001 peter

Remove reference to splz_unpend - it is long gone.


# 70006 14-Dec-2000 jake

Use _lapic+offset to access the local apic from assembly language
files, rather than the symbols in globals.s. The offsets are
generated by genassym.


# 69971 13-Dec-2000 jake

Introduce a new potientially cleaner interface for accessing per-cpu
variables from i386 assembly language. The syntax is PCPU(member)
where member is the capitalized name of the per-cpu variable, without
the gd_ prefix. Example: movl %eax,PCPU(CURPROC). The capitalization
is due to using the offsets generated by genassym rather than the symbols
provided by linking with globals.o. asmacros.h is the wrong place for
this but it seemed as good a place as any for now. The old implementation
in asnames.h has not been removed because it is still used to de-mangle
the symbols used by the C variables for the UP case.


# 69578 04-Dec-2000 peter

Cleanup some leftover lint from the old interrupt system.
Also, while here, run up to 32 interrupt sources on APIC systems.
Normalize INTREN/INTRDIS so they are the same on both UP and SMP systems
rather than sometimes a macro, and sometimes a function.

Reviewed by: jhb, jakeb


# 69431 01-Dec-2000 jake

Change doreti to take a trapframe instead of an intrframe.
Remove associated pushes of dummy units to convert frame.

Reviewed by: jhb


# 66716 06-Oct-2000 jhb

- Change fast interrupts on x86 to push a full interrupt frame and to
return through doreti to handle ast's. This is necessary for the
clock interrupts to work properly.
- Change the clock interrupts on the x86 to be fast instead of threaded.
This is needed because both hardclock() and statclock() need to run in
the context of the current process, not in a separate thread context.
- Kill the prevproc hack as it is no longer needed.
- We really need Giant when we call psignal(), but we don't want to block
during the clock interrupt. Instead, use two p_flag's in the proc struct
to mark the current process as having a pending SIGVTALRM or a SIGPROF
and let them be delivered during ast() when hardclock() has finished
running.
- Remove CLKF_BASEPRI, which was #ifdef'd out on the x86 anyways. It was
broken on the x86 if it was turned on since cpl is gone. It's only use
was to bogusly run softclock() directly during hardclock() rather than
scheduling an SWI.
- Remove the COM_LOCK simplelock and replace it with a clock_lock spin
mutex. Since the spin mutex already handles disabling/restoring
interrupts appropriately, this also lets us axe all the *_intr() fu.
- Back out the hacks in the APIC_IO x86 cpu_initclocks() code to use
temporary fast interrupts for the APIC trial.
- Add two new process flags P_ALRMPEND and P_PROFPEND to mark the pending
signals in hardclock() that are to be delivered in ast().

Submitted by: jakeb (making statclock safe in a fast interrupt)
Submitted by: cp (concept of delaying signals until ast())


# 66698 05-Oct-2000 jhb

- Heavyweight interrupt threads on the alpha for device I/O interrupts.
- Make softinterrupts (SWI's) almost completely MI, and divorce them
completely from the x86 hardware interrupt code.
- The ihandlers array is now gone. Instead, there is a MI shandlers array
that just contains SWI handlers.
- Most of the former machine/ipl.h files have moved to a new sys/ipl.h.
- Stub out all the spl*() functions on all architectures.

Submitted by: dfr


# 65557 06-Sep-2000 jasone

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


# 61339 06-Jun-2000 dillon

INTR_TYPE_FAST / FAST_INTR interrupts (currently just serial interrupts)
have their own lock and do not need the MP lock. The SMP cleanup was
a little too conservative in MP locking fast interrupts but at least
it's trivial to fix. MFC soon.

Submitted by: bde


# 61130 31-May-2000 bde

Pack the SWI bits to save some time and space.


# 61076 29-May-2000 dfr

I forgot that the previous change was...

Submitted by: bde


# 61075 29-May-2000 dfr

Add SWI_TQ_MASK to imask definition.


# 60303 09-May-2000 obrien

1. `movl' is for use with 32-bit operands. Do NOT use it with 16-bit
operands. `movw' could be used, but instead let the assembler decide
the right instruction to use.
2. AT&T asm syntax requires a leading '*' in front of the operand for
indirect calls and jumps.


# 58764 29-Mar-2000 dillon

The SMP cleanup commit broke need_resched, this fixes that and also
removed unncessary MPLOCKED and 'lock' prefixes from the interrupt
nesting level, since (A) the MP lock is held at the time, and (B) since
the neting level is restored prior to return any interrupted code
will see a consistent value.


# 58717 28-Mar-2000 dillon

Commit major SMP cleanups and move the BGL (big giant lock) in the
syscall path inward. A system call may select whether it needs the MP
lock or not (the default being that it does need it).

A great deal of conditional SMP code for various deadended experiments
has been removed. 'cil' and 'cml' have been removed entirely, and the
locking around the cpl has been removed. The conditional
separately-locked fast-interrupt code has been removed, meaning that
interrupts must hold the CPL now (but they pretty much had to anyway).
Another reason for doing this is that the original separate-lock for
interrupts just doesn't apply to the interrupt thread mechanism being
contemplated.

Modifications to the cpl may now ONLY occur while holding the MP
lock. For example, if an otherwise MP safe syscall needs to mess with
the cpl, it must hold the MP lock for the duration and must (as usual)
save/restore the cpl in a nested fashion.

This is precursor work for the real meat coming later: avoiding having
to hold the MP lock for common syscalls and I/O's and interrupt threads.
It is expected that the spl mechanisms and new interrupt threading
mechanisms will be able to run in tandem, allowing a slow piecemeal
transition to occur.

This patch should result in a moderate performance improvement due to
the considerable amount of code that has been removed from the critical
path, especially the simplification of the spl*() calls. The real
performance gains will come later.

Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch


# 53425 19-Nov-1999 dillon

Optimize two cases in the MP locking code. First, it is not necessary
to use a locked cmpexg when unlocking a lock that we already hold, since
nobody else can touch the lock while we hold it. Second, it is not
necessary to use a locked cmpexg when locking a lock that we already
hold, for the same reason. These changes will allow MP locks to be used
recursively without impacting performance.

Modify two procedures that are called only by assembly and are already
NOPROF entries to pass a critical argument in %edx instead of on the
stack, removing a significant amount of code from the critical path
as a consequence.

Reviewed by: Alfred Perlstein <bright@wintelcom.net>, Peter Wemm <peter@netplex.com.au>


# 51114 09-Sep-1999 msmith

Invoke smp_rendezvous_action() using the a.out compatible asnames.h
technique (bleagh).


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 50251 23-Aug-1999 alc

Modify the macros IMASK_UNLOCK, CPL_UNLOCK, and REL_FAST_INTR_LOCK
to perform the s_unlock inline.


# 48924 20-Jul-1999 msmith

Implement an all-CPU shootdown-style rendezvous facility. This allows
the caller to specify a function to be guarded between an entry and exit
barrier, as well as pre- and post-barrier functions.

The primary use for this function is synchronised update of per-cpu private
data. The implementation is almost (but not quite) MI; with a better
mechanism for masking per-CPU interrupts it could probably be hoisted.

Reviewed by: peter (partially)


# 48729 10-Jul-1999 bde

Go back to the old (icu.s rev.1.7 1993) way of keeping the AST-pending
bit separate from ipending, since this is simpler and/or necessary for
SMP and may even be better for UP.

Reviewed by: alc, luoqi, tegge


# 48505 03-Jul-1999 alc

An SMP-specific change: Add the lock prefix to RMW operations
on ipending.


# 47942 16-Jun-1999 tegge

Clean up bitrot in interrupt tracing code.


# 47678 01-Jun-1999 jlemon

Unifdef VM86.

Reviewed by: silence on on -current


# 47588 28-May-1999 bde

Fixed glitches (jumps) of about 1/HZ seconds for the i8254 timecounter.
The old version only worked right when the time was read strictly
more often than every 1/HZ seconds, but we only guarantee reading
it every (1/HZ + epsilon) seconds. Part of rev.1.126-1.127 attempted
to fix this but didn't succeed. Detect counter rollover using the
heuristic from the old version of microtime() with additional
complications for supporting calls from fast interrupt handlers.
This works provided i8254 interrupts are not delayed by more than
1/(2*HZ) seconds.

This needs more comments, and cleanups for the SMP case, and more
testing of the SMP case before it is merged into RELENG_3.

Tested by: jhay


# 46129 27-Apr-1999 luoqi

Enable vmspace sharing on SMP. Major changes are,
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
is added to point to per-cpu pages, per-cpu global variables are now
accessed through this new selector (%fs). The selectors in gdt table are
rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.

Reviewed by: Alan Cox <alc@cs.rice.edu>
John Dyson <dyson@iquest.net>
Julian Elischer <julian@whistel.com>
Bruce Evans <bde@zeta.org.au>
David Greenman <dg@root.com>


# 45676 14-Apr-1999 bde

Generate intrnames[] dynamically. This should be new-bus friendly.

Old version reviewed by: se


# 45555 10-Apr-1999 tegge

Test CF after a btrl operation instead of testing ZF (which is undefined).


# 38888 06-Sep-1998 tegge

Maintain a mapping from irq number to (ioapic number, int pin) tuple,
and use this when masking/unmasking interrupts.

Maintain a mapping from (iopaic number, int pin) tuple to irq number,
and use this when configuring devices and programming the ioapics.

Previous code assumed that irq number was equal to int pin number, and
that the ioapic number was 0.

Don't let an AP enter _cpu_switch before all local apics are initialized.


# 38824 04-Sep-1998 luoqi

Make irq forwarding truely functional.


# 38246 11-Aug-1998 bde

Register tty software interrupt handlers at run time using register_swi()
instead of at compile time using ifdefs.

Use _swi_null instead of dummycamisr. CAM and dpt should call
register_swi() instead of hacking on ihandlers[] directly.


# 38244 11-Aug-1998 bde

Implemented dynamic registration of software interrupt handlers. Not
used yet.

Use dummy SWI handlers to avoid some checks for null pointers.


# 36135 17-May-1998 tegge

Add forwarding of roundrobin to other cpus. This gives a more regular
update of cpu usage as shown by top when one process is cpu bound
(no system calls) while the system is otherwise idle (except for top).

Don't attempt to switch to the BSP in boot(). If the system was idle when
an interrupt caused a panic, this won't work. Instead, switch to the BSP
in cpu_reset.

Remove some spurious forward_statclock/forward_hardclock warnings.


# 35393 22-Apr-1998 tegge

Mask the interrupt before setting the corresponding bit in ipending if
the interrupt is already active.
Don't use lock prefix for operations on ipending.
Always use lock prefix for operations on iactive.


# 34058 05-Mar-1998 tegge

Remove special handling for resuming clock interrupt when using APIC_IO.
The `generic' vector stubs do the right thing.


# 34021 03-Mar-1998 tegge

When entering the apic version of slow interrupt handler, level
interrupts are masked, and EOI is sent iff the corresponding ISR bit
is set in the local apic. If the CPU cannot obtain the interrupt
service lock (currently the global kernel lock) the interrupt is
forwarded to the CPU holding that lock.

Clock interrupts now have higher priority than other slow interrupts.


# 34020 03-Mar-1998 tegge

Forward the signal if the process runs on a different CPU. This reduces
the signal handling latency for cpu-bound processes that performs very
few system calls.

The IPI for forcing an additional software trap is no longer dependent upon
BETTER_CLOCK being defined.


# 32518 15-Jan-1998 gibbs

Addition of splsoftvm and a VM SWI to handle bus dma related callbacks.
This SWI may be useful for other, defered, VM tasks.


# 31638 08-Dec-1997 fsmp

The improvements to clock statistics by Tor Egge
Wrappered and enabled by the define BETTER_CLOCK (on by default in smpyests.h)

apic_vector.s also contains a small change I (smp) made to eliminate
the double level INT problem. It seems stable, but I haven't the tools
in place to prove it fixes the problem.

Reviewed by: smp@csn.net
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>


# 29945 28-Sep-1997 gibbs

Fix a serious bug I introduced while adding in support for CAM interrupts.
It seems I didn't count my 0's properly when adding the new masks into
icu_vector.s pushing SWI_AST_MASK off the end of the array and screwing
up the indexing for SWI_CLOCK_MASK.

Fix the bug icu_vector.s and also reformat the code in both icu_vector.s and
apic_vector.s so that it will be much harder to make the same mistake in
the future.

Submitted by: Bruce Evans <bde@zeta.org.au>


# 29677 21-Sep-1997 gibbs

aha1542.c aic6360.c cy.c fd.c ft.c
if_ie.c if_wl.c if_zp.c isa.c isa_device.h
labpc.c mcd.c ncr5380.c scd.c seagate.c si.c
sio.c tw.c ultra14f.c wcd.c wd.c:

Update for changes in the callout interface.

apic_vector.s icu_vector.s ipl.s ipl_funcs.c:

Add CAM software/hardware interrupt support.


# 29213 07-Sep-1997 fsmp

General cleanup of the lock pushdown code. They are grouped and enabled
from machine/smptests.h:

#define PUSHDOWN_LEVEL_1
#define PUSHDOWN_LEVEL_2
#define PUSHDOWN_LEVEL_3
#define PUSHDOWN_LEVEL_4_NOT


# 28921 30-Aug-1997 fsmp

Another round of lock pushdown.
Add a simplelock to deal with disable_intr()/enable_intr() as used in UP kernel.
UP kernel expects that this is enough to guarantee exclusive access to
regions of code bracketed by these 2 functions.
Add a simplelock to bracket clock accesses in clock.c: clock_lock.

Help from: Bruce Evans <bde@zeta.org.au>


# 28910 29-Aug-1997 fsmp

Support for the new FAST_HI algorithm, enabled.

Preliminary support for the INTR_SIMPLELOCK algorithm, disabled.
Note that this code is NOT ready.


# 28641 23-Aug-1997 fsmp

The last of the encapsolation of cpl/spl/ipending things into a critical
region protected by the simplelock 'cpl_lock'.

Notes:

- this code is currently controlled on a section by section basis with
defines in machine/param.h. All sections are currently enabled.

- this code is not as clean as I would like, but that can wait till later.

- the "giant lock" still surrounds most instances of this "cpl region".
I still have to do the code that arbitrates setting cpl between the
top and bottom halves of the kernel.

- the possibility of deadlock exists, I am committing the code at this
point so as to exercise it and detect any such cases B4 the "giant lock"
is removed.


# 28487 21-Aug-1997 fsmp

Made PEND_INTS default.
Made NEW_STRATEGY default.
Removed misc. old cruft.

Centralized simple locks into mp_machdep.c
Centralized simple lock macros into param.h

More cleanup in the direction of making splxx()/cpl MP-safe.


# 28442 20-Aug-1997 fsmp

Preperation for moving cpl into critical region access.
Several new fine-grained locks.
New FAST_INTR() methods:
- separate simplelock for FAST_INTR, no more giant lock.
- FAST_INTR()s no longer checks ipending on way out of ISR.
sio made MP-safe (I hope).


# 28043 10-Aug-1997 fsmp

Added trap specific lock calls: get_fpu_lock, etc.
All resolve to the GIANT_LOCK at this time, it is purely a logical partitioning.


# 28023 09-Aug-1997 fsmp

Added 'lock' instruction before 3 places that update ipending.
This may or may not fix the "high IO freezes SMP kernel" problem.


# 27779 31-Jul-1997 fsmp

Converted the TEST_LOPRIO code to default.
removed PEND_INTS 1st try
direct call to MPtrylock


# 27728 28-Jul-1997 fsmp

Modified the PEND_INTS algorithm to fix the ISA INT loss problem.

Noticed by: dave adkins <adkin003@gold.tc.umn.edu> and others.


# 27696 25-Jul-1997 fsmp

clock.c:
- removed TEST_ALTTIMER.
- removed APIC_PIN0_TIMER.
- removed TIMER_ALL.

apic_vector.s:
- new algorithm where a CPU uses try_mplock instead of get_mplock:
if successful continue as before.
if fail set ipending bit, mask INT (to avoid recursion), cleanup & iret.

This allows the CPU to return to successful work, while the ISR will be run
by the CPU holding the lock as part of the doreti dance.


# 27634 23-Jul-1997 fsmp

New simple_lock code in asm:
- s_lock_init()
- s_lock()
- s_lock_try()
- s_unlock()

Created lock for IO APIC and apic_imen (SMP version of imen)
- imen_lock

Code to use imen_lock for access from apic_ipl.s and apic_vector.s.
Moved this code *outside* of mp_lock.

It seems to work!!!


# 27517 18-Jul-1997 fsmp

Split TEST_CPUSTOP code into CPUSTOP_ON_DDBBREAK and mainline code.


# 27408 15-Jul-1997 fsmp

Cleanup.


# 27406 15-Jul-1997 fsmp

Tighten up asm code for EOI access.


# 27352 12-Jul-1997 fsmp

Cleanup old stop_cpus/restart_cpus() cruft.
new code for handling mixed-mode 8259/APIC programming without 'ExtInt'
new code to control other CPUs: stop_cpus()/restart_cpus()/_Xstopcpu


# 27251 06-Jul-1997 fsmp

First cut at code for handling "spurious INTerrupts".
First cut at code for handling CPU stop/restart.

Notes:
not working properly yet.


# 27007 27-Jun-1997 fsmp

apic_vector.s:
- added Xcpustop IPI code to support stop_cpus()/restart_cpus().
it is off by default, enable via smptests.h:TEST_CPUSTOP

intr_machdep.h:
- moved +ICULEN to lower level.
- added entry for Xcpustop.


# 26812 22-Jun-1997 peter

Preliminary support for per-cpu data pages.

This eliminates a lot of #ifdef SMP type code. Things like _curproc reside
in a data page that is unique on each cpu, eliminating the expensive macros
like: #define curproc (SMPcurproc[cpunumber()])

There are some unresolved bootstrap and address space sharing issues at
present, but Steve is waiting on this for other work. There is still some
strictly temporary code present that isn't exactly pretty.

This is part of a larger change that has run into some bumps, this part is
standalone so it should be safe. The temporary code goes away when the
full idle cpu support is finished.

Reviewed by: fsmp, dyson


# 26302 31-May-1997 peter

The SWI_NET_MASK and SWI_TTY_MASK handlers are now back adjacent to the
top of the hardware interrupt handlers. Apparently this is slightly
faster with the bit scanning instruction that looks these up - this set of
changes reverts the original change.

Reviewed by: bde


# 26168 26-May-1997 fsmp

Split vector.s into UP and SMP specific files:
- vector.s <- stub called by i386/exception.s
- icu_vector.s <- UP
- apic_vector.s <- SMP

Split icu.s into UP and SMP specific files:
- ipl.s <- stub called by i386/exception.s (formerly icu.s)
- icu_ipl.s <- UP
- apic_ipl.s <- SMP

This was done in preparation for massive changes to the SMP INTerrupt
mechanisms. More fine tuning, such as merging ipl.s into exception.s,
may be appropriate.