Cross Reference: /freebsd-9.3-release/sys/x86/x86/local

History log of /freebsd-9.3-release/sys/x86/x86/local_apic.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 267654	19-Jun-2014	gjb	Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-9.3-release
# 258218	16-Nov-2013	mav	MFC r250576 (by eadler): Fix several typos PR: kern/176054
# 247877	06-Mar-2013	avg	MFC r246247: x86 suspend/resume: suspend pics and pseudo-pics in reverse order
# 247547	01-Mar-2013	jhb	MFC 245577,245640: Don't attempt to use clflush on the local APIC register window. Various CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on Pentium-M, and trashing of the local APIC registers on a VIA C7). The local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't necessary anyway.
# 225736	22-Sep-2011	kensmith	Copy head to stable/9 as part of 9.0-RELEASE release cycle. Approved by: re (implicit)
# 222813	07-Jun-2011	attilio	etire the cpumask_t type and replace it with cpuset_t usage. This is intended to fix the bug where cpu mask objects are capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever value. Anyway, as long as several structures in the kernel are statically allocated and sized as MAXCPU, it is suggested to keep it as low as possible for the time being. Technical notes on this commit itself: - More functions to handle with cpuset_t objects are introduced. The most notable are cpusetobj_ffs() (which calculates a ffs(3) for a cpuset_t object), cpusetobj_strprint() (which prepares a string representing a cpuset_t object) and cpusetobj_strscan() (which creates a valid cpuset_t starting from a string representation). - pc_cpumask and pc_other_cpus are target to be removed soon. With the moving from cpumask_t to cpuset_t they are now inefficient and not really useful. Anyway, for the time being, please note that access to pcpu datas is protected by sched_pin() in order to avoid migrating the CPU while reading more than one (possible) word - Please note that size of cpuset_t objects may differ between kernel and userland. While this is not directly related to the patch itself, it is good to understand that concept and possibly use the patch as a reference on how to deal with cpuset_t objects in userland, when accessing kernland members. - KTR_CPUMASK is changed and now is represented through a string, to be set as the example reported in NOTES. Please additively note that no MAXCPU is bumped in this patch, but private testing has been done until to MAXCPU=128 on a real 8x8x2(htt) machine (amd64). Please note that the FreeBSD version is not yet bumped because of the upcoming pcpu changes. However, note that this patch is not targeted for MFC. People to thank for the time spent on this patch: - sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested several revision of the patches and really helped in improving stability of this work. - marius fixed several bugs in the sparc64 implementation and reviewed patches related to ktr. - jeff and jhb discussed the basic approach followed. - kib and marcel made targeted review on some specific part of the patch. - marius, art, nwhitehorn and andreast reviewed MD specific part of the patch. - marius, andreast, gonzo, nwhitehorn and jceel tested MD specific implementations of the patch. - Other people have made contributions on other patches that have been already committed and have been listed separately. Companies that should be mentioned for having participated at several degrees: - Yahoo! for having offered the machines used for testing on big count of CPUs. - The FreeBSD Foundation for having sponsored my devsummit attendance, which has been instrumental. - Sandvine for having offered offices and infrastructure during development. (I really hope I didn't forget anyone, if it happened I apologize in advance).
# 221508	05-May-2011	mav	Some changes around LAPIC timer programming. This fixes heavy interrupt storm and resulting system freeze when using LAPIC timer in one-shot mode under Xen HVM. There, unlike real hardware, programming timer with zero period almost immediately causes interrupt.
# 217360	13-Jan-2011	jhb	If an interrupt on an I/O APIC is moved to a different CPU after it has started to execute, it seems that the corresponding ISR bit in the "old" local APIC can be cleared. This causes the local APIC interrupt routine to fail to find an interrupt to service. Rather than panic'ing in this case, simply return from the interrupt without sending an EOI to the local APIC. If there are any other pending interrupts in other ISR registers, the local APIC will assert a new interrupt. Tested by: steve
# 215751	23-Nov-2010	avg	x86/local_apic: use newly added ARAT bit definition ARAT: APIC-Timer-always-running feature. Suggested by: mav MFC after: 12 days
# 215009	08-Nov-2010	jhb	Sync the APIC startup sequence with amd64: - Register APIC enumerators at SI_SUB_TUNABLES - 1 instead of SI_SUB_CPU - 1. - Probe CPUs at SI_SUB_TUNABLES - 1. This allows i386 to set a truly accurate mp_maxid value rather than always setting it to MAXCPU - 1.
# 215001	08-Nov-2010	jhb	Only dump the values of the PMC and CMCI local vector table entries on a local APIC if those LVT entries are valid. This quiets spurious illegal register local APIC errors during boot on a CPU that doesn't support those vectors. MFC after: 1 week
# 214631	01-Nov-2010	jhb	Move <machine/apicreg.h> to <x86/apicreg.h>.
# 214630	01-Nov-2010	jhb	Move the <machine/mca.h> header to <x86/mca.h>.
# 214347	25-Oct-2010	jhb	Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned by intr_disable(). Requested by: bde
# 212541	13-Sep-2010	mav	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.
# 212004	30-Aug-2010	rpaulo	When DTrace is enabled, make sure we don't overwrite the IDT_DTRACE_RET entry with an IRQ for some hardware component. Reviewed by: jhb Sponsored by: The FreeBSD Foundation
# 211756	24-Aug-2010	mav	Enable timer interrupt before starting timer. This allows to handle very short periods without interrupt loss.
# 210444	24-Jul-2010	mav	Increment td->td_intr_nesting_level for LAPIC timer interrupts. Among other things it hints SCHED_ULE to run clock swi handlers on their native CPUs, avoiding many unneeded IPI_PREEMPT calls.
# 210298	20-Jul-2010	mav	Fix several un-/signedness bugs of r210290 and r210293. Add one more check.
# 210290	20-Jul-2010	mav	Extend timer driver API to report also minimal and maximal supported period lengths. Make MI wrapper code to validate periods in request. Make kernel clock management code to honor these hardware limitations while choosing hz, stathz and profhz values.
# 209990	13-Jul-2010	mav	Rise knowledge about curthread->td_intr_frame by one step. Make timer callback argument really opaque. Not repeat interrupt handler's problem in case somebody will ever need to have both argument and frame.
# 209371	20-Jun-2010	mav	Implement new event timers infrastructure. It provides unified APIs for writing event timer drivers, for choosing best possible drivers by machine independent code and for operating them to supply kernel with hardclock(), statclock() and profclock() events in unified fashion on various hardware. Infrastructure provides support for both per-CPU (independent for every CPU core) and global timers in periodic and one-shot modes. MI management code at this moment uses only periodic mode, but one-shot mode use planned for later, as part of tickless kernel project. For this moment infrastructure used on i386 and amd64 architectures. Other archs are welcome to follow, while their current operation should not be affected. This patch updates existing drivers (i8254, RTC and LAPIC) for the new order, and adds event timers support into the HPET driver. These drivers have different capabilities: LAPIC - per-CPU timer, supports periodic and one-shot operation, may freeze in C3 state, calibrated on first use, so may be not exactly precise. HPET - depending on hardware can work as per-CPU or global, supports periodic and one-shot operation, usually provides several event timers. i8254 - global, limited to periodic mode, because same hardware used also as time counter. RTC - global, supports only periodic mode, set of frequencies in Hz limited by powers of 2. Depending on hardware capabilities, drivers preferred in following orders, either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC. User may explicitly specify wanted timers via loader tunables or sysctls: kern.eventtimer.timer1 and kern.eventtimer.timer2. If requested driver is unavailable or unoperational, system will try to replace it. If no more timers available or "NONE" specified for second, system will operate using only one timer, multiplying it's frequency by few times and uing respective dividers to honor hz, stathz and profhz values, set during initial setup.
# 208507	24-May-2010	jhb	Add support for corrected machine check interrupts. CMCI is a new local APIC interrupt that fires when a threshold of corrected machine check events is reached. CMCI also includes a count of events when reporting corrected errors in the bank's status register. Note that individual banks may or may not support CMCI. If they do, each bank includes its own threshold register that determines when the interrupt fires. Currently the code uses a very simple strategy where it doubles the threshold on each interrupt until it succeeds in throttling the interrupt to occur only once a minute (this interval can be tuned via sysctl). The threshold is also adjusted on each hourly poll which will lower the threshold once events stop occurring. Tested by: Sailaja Bangaru sbappana at yahoo com MFC after: 1 month
# 208494	24-May-2010	mav	- Implement MI helper functions, dividing one or two timer interrupts with arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.
# 208479	23-May-2010	mav	Restore different APIC init orders for i386 and amd64 unified in r208452. Seems noone of them contents both arch for different reasons. Submitted by: kib@
# 208452	23-May-2010	mav	Unify local_apic.c for x86 archs,
# 206901	20-Apr-2010	rpaulo	Rename the cyclic global variable lapic_cyclic_clock_func to just cyclic_clock_func. This will make more sense when we start developing non x86 cyclic version.
# 205851	29-Mar-2010	jhb	Add a handler for the local APIC error interrupt. For now it just prints out the current value of the local APIC error register when the interrupt fires. MFC after: 1 week
# 204641	03-Mar-2010	attilio	Improving the clocks auto-tunning by firstly checking if the atrtc may be correctly initialized and just then assign to softclock/profclock. Right now, some atrtc seems reporting strange diagnostic error* making the current pattern bogus. In order to do that cleanly, lapic_setup_clock(), on both ia32 and amd64, now accepts as arguments the desired sources to handle, and returns the actual ones (LAPIC_CLOCK_NONE is forbidden because otherwise there is no meaning in calling such function). This allows to bring out into commont x86 code the handling part for machdep.lapic_allclocks tunable, which is retained. Sponsored by: Sandvine Incorporated Tested by: yongari, Richard Todd <rmtodd at ichotolot dot servalan dot com> MFC: 3 weeks X-MFC: r202387, 204309
# 202387	15-Jan-2010	attilio	Handling all the three clocks (hardclock, softclock, profclock) with the LAPIC may lead to aliasing for softclock and profclock because frequencies are sized in order to fit mainly hardclock. atrtc used to take care of the softclock and profclock and it does still do, if the LAPIC can't handle the clocks properly. Revert the change when the LAPIC started taking charge of all three of them and let atrtc handle softclock and profclock if not explicitly requested. Such request can be made setting != 0 the new tunable machdep.lapic_allclocks or if the new device ATPIC is not present within the i386 kernel config (atrtc is linked to atpic presence). Diagnosed by: Sandvine Incorporated Reviewed by: jhb, emaste Sponsored by: Sandvine Incorporated MFC: 3 weeks
# 202161	12-Jan-2010	gavin	Spell "Hz" correctly wherever it is user-visible. PR: bin/142566 Submitted by: N.J. Mann njm njm.me.uk Approved by: ed (mentor) MFC after: 2 weeks
# 196745	01-Sep-2009	jhb	Don't attempt to bind the current thread to the CPU an IRQ is bound to when removing an interrupt handler from an IRQ during shutdown. During shutdown we are already bound to CPU 0 and this was triggering a panic. MFC after: 3 days
# 196224	14-Aug-2009	jhb	Adjust the handling of the local APIC PMC interrupt vector: - Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc() routines in the local APIC code that the hwpmc(4) driver can use to manage the local APIC PMC interrupt vector. - Do not enable the local APIC PMC interrupt vector by default when HWPMC_HOOKS is enabled. Instead, the hwpmc(4) driver explicitly enables the interrupt when it is succesfully initialized and disables the interrupt when it is unloaded. This avoids enabling the interrupt on unsupported CPUs which may result in spurious NMIs. Reported by: rnoland Reviewed by: jkoshy Approved by: re (kib) MFC after: 2 weeks
# 196196	13-Aug-2009	attilio	* Completely Remove the option STOP_NMI from the kernel. This option has proven to have a good effect when entering KDB by using a NMI, but it completely violates all the good rules about interrupts disabled while holding a spinlock in other occasions. This can be the cause of deadlocks on events where a normal IPI_STOP is expected. * Adds an new IPI called IPI_STOP_HARD on all the supported architectures. This IPI is responsible for sending a stop message among CPUs using a privileged channel when disponible. In other cases it just does match a normal IPI_STOP. Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64 architectures, while on the other has a normal IPI_STOP effect. It is responsibility of maintainers to eventually implement an hard stop when necessary and possible. * Use the new IPI facility in order to implement a new userend SMP kernel function called stop_cpus_hard(). That is specular to stop_cpu() but it does use the privileged channel for the stopping facility. * Let KDB use the newly introduced function stop_cpus_hard() and leave stop_cpus() for all the other cases * Disable interrupts on CPU0 when starting the process of APs suspension. * Style cleanup and comments adding This patch should fix the reboot/shutdown deadlocks many users are constantly reporting on mailing lists. Please don't forget to update your config file with the STOP_NMI option removal Reviewed by: jhb Tested by: pho, bz, rink Approved by: re (kib)
# 195249	01-Jul-2009	jhb	Improve the handling of cpuset with interrupts. - For x86, change the interrupt source method to assign an interrupt source to a specific CPU to return an error value instead of void, thus allowing it to fail. - If moving an interrupt to a CPU fails due to a lack of IDT vectors in the destination CPU, fail the request with ENOSPC rather than panicing. - For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used on the first interrupt in a group. Moving the first interrupt in a group moves the entire group. - Use the icu_lock to protect intr_next_cpu() on x86 instead of the intr_table_lock to fix a LOR introduced in the last set of MSI changes. - Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with interrupts. Previously, binding an interrupt to a CPU only performed a privilege check if the interrupt had an interrupt thread. Interrupts without a thread could be bound by non-root users as a result. - If an interrupt event's assign_cpu method fails, then restore the original cpuset mask for the associated interrupt thread. Approved by: re (kib)
# 194889	24-Jun-2009	jhb	Whitespace fix.
# 193804	09-Jun-2009	ariff	Move C1E workaround into its own idle function. Previous workaround works only during initial booting process, while there are laptops/BIOSes that tend to act 'smarter' by force enabling C1E if the main power adapter being pulled out, rendering previous workaround ineffective. Given the fact that we still rely on local APIC to drive timer interrupt, this workaround should keep all Turion (probably Phenom too) X\d+ alive whether its on battery power or not. URL: http://lists.freebsd.org/pipermail/freebsd-acpi/2008-April/004858.html http://lists.freebsd.org/pipermail/freebsd-acpi/2008-May/004888.html Tested by: Peter Jeremy <peterjeremy at optushome d com d au>
# 191803	04-May-2009	mav	Do not try to initialize LAPIC timer if we are not going to use it. It solves assertion, when kernel built with INVARIANTS configured to use i8254 timer.
# 191730	01-May-2009	mav	Small addition to r191720. Restore previous behaviour for the case of unknown interrupt. Invocation of IRQ -1 crashes my system on resume. Returning 0, as it was, is not perfect also, but at least not so dangerous.
# 191720	01-May-2009	mav	Use value -1 instead of 0 for marking unused APIC vectors. This fixes IRQ0 routing on LAPIC-enabled systems. Add hint.apic.0.clock tunable. Setting it 0 disables using LAPIC timers as hard-/stat-/profclock sources falling back to using i8254 and rtc timers. On modern CPUs LAPIC is a part of CPU core which is shutting down when CPU enters C3 or deeper power state. It makes no problems for interrupt processing, as chipset wakes up CPU on interrupt triggering. But entering C3 state kills LAPIC timer and freezes system time, making C3 and deeper states practically unusable. Using i8254 timer allows to avoid this problem. By using i8254 timer my T7700 C2D CPU with UP kernel successfully enters C3 state, saving more then a Watt of total idle power (>10%) in addition to all other power-saving techniques. This technique is not working for SMP yet, as only one CPU receives timer interrupts. But I think that problem could be fixed by forwarding interrupts to other CPUs with IPI.
# 188904	21-Feb-2009	jeff	- Resolve an issue where we may clear an idt while an interrupt on a different cpu is still assigned to that vector by never clearing idt entries. This was only provided as a debugging feature and the bugs are caught by other means. - Drop the sched lock when rebinding to reassign an interrupt vector to a new cpu so that pending interrupts have a chance to be delivered before removing the old vector. Discussed with: tegge, jhb
# 187880	29-Jan-2009	jeff	- Allocate apic vectors on a per-cpu basis. This allows us to allocate more irqs as we have more cpus. This is principally useful on systems with msi devices which may want many irqs per-cpu. Discussed with: jhb Sponsored by: Nokia
# 185933	11-Dec-2008	jhb	Add constants for fields in the local APIC error status register and a routine to read it.
# 185341	26-Nov-2008	jkim	Introduce cpu_vendor_id and replace a lot of strcmp(cpu_vendor, "..."). Reviewed by: jhb, peter (early amd64 version)
# 184372	27-Oct-2008	sobomax	Fix r184323 - set stathz to be the same as lapic_timer_hz when lapic_timer_hz is less than 128. Remove extra {} to match existing style.
# 184293	26-Oct-2008	sobomax	Fix division by zero panic if kern.hz less than 32. MFC after: 1 day
# 182902	10-Sep-2008	kmacy	Get initial bootstrap of APs working under xen. Note that the APs still blow up in sched_throw(). MFC after: 1 month
# 182046	23-Aug-2008	jhb	Adjust the handling the various timer frequencies when using the lapic timer. Previously, the various divisors were fixed which meant that while it gave somewhat reasonable stathz, etc. at hz=1000, it went off the rails with any other hz value. With these changes, we now pick a lapic timer hz based on the value of hz. If hz is >= 1500, then the lapic timer runs at hz. If 1500 hz >= 750, we run the lapic timer at hz * 2. If hz < 750, we run at hz * 4. We compute a divider at runtime to make stathz run as close to 128 as we can since stathz really wants to be run at something close to that frequency. Profiling just runs on every clock tick. So some examples: With hz = 100, the lapic timer now runs at 400 instead of 2000. stathz will be 133, and profhz = 400. With hz = 1000 (default), the lapic timer is still at 2000 (as it is now), stathz is at 133 (as it is now), and profhz will be 2000 (previously 666). MFC after: 2 weeks
# 182021	22-Aug-2008	kmacy	Don't try enumerating APICs when running on top of xen (fixes boot on 64-bit dom0s) MFC after: 1 month
# 179277	24-May-2008	jb	Add the DTrace hooks for exception handling (Function boundary trace -fbt- provider), cyclic clock and syscalls.
# 177253	16-Mar-2008	rwatson	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
# 172144	11-Sep-2007	attilio	This is a follow-up, cleaning-up commit about recent changes involving topology foo functions. Working at the patch for topology problems in ia32/amd64 evicted some problems regarding functions ordering in the SI_SUB_CPU family of SYSINIT'ed subsystems. In order to avoid problems with new modified to involved functions, a correct ordering is not semantically specified for SI_SUB_CPU functions (for a larger view of the issue please visit: http://lists.freebsd.org/pipermail/freebsd-current/2007-July/075409.html ) Discussed with: peter Tested by: kris, Rui Paulo <rpaulo@FreeBSD.org> Approved by: jeff Approved by: re
# 171702	02-Aug-2007	peter	Move mp_topology() from apic_init(i386) and apic_setup_local(amd64) to cpu_start_mp(). This is after we have read the cpuid registers to calculate the hyperthreading_cpus value for the sysctl that enables or disables hyperthread cores. Change mp_topology() to use that information rather than trying to do it itself. This solves the problem of ULE being incorrectly told that dual core Athlon64 X2 or Operton cpus are hyperthreading cores. At the very least, we now have a single piece of code to identify hyperthreading. Obtained from: jhb Approved by: re (kensmith)
# 169395	08-May-2007	jhb	Handle CPUs with APIC IDs higher than 32 (at least one IBM server uses an APIC ID of 38 for its second CPU): - Add a new MAX_APIC_ID constant for the highest valid APIC ID for modern systems. - Size the various arrays in the MADT, MP Table, and SMP code that are indexed by APIC IDs to allow for up to MAX_APIC_ID. - Explicitly go through and assign logical cpu ids to local APICs before starting any of the APs up rather than doing it while starting up the APs. This step is now where we honor MAXCPU. MFC after: 1 week
# 169391	08-May-2007	jhb	Minor fixes and tweaks to the x86 interrupt code: - Split the intr_table_lock into an sx lock used for most things, and a spin lock to protect intrcnt_index. Originally I had this as a spin lock so interrupt code could use it to lookup sources. However, we don't actually do that because it would add a lot of overhead to interrupts, and if we ever do support removing interrupt sources, we can use other means to safely do so w/o locking in the interrupt handling code. - Replace is_enabled (boolean) with is_handlers (a count of handlers) to determine if a source is enabled or not. This allows us to notice when a source is no longer in use. When that happens, we now invoke a new PIC method (pic_disable_intr()) to inform the PIC driver that the source is no longer in use. The I/O APIC driver frees the APIC IDT vector when this happens. The MSI driver no longer needs to have a hack to clear is_enabled during msi_alloc() and msix_alloc() as a result of this change as well. - Add an apic_disable_vector() to reset an IDT vector back to Xrsvd to complement apic_enable_vector() and use it in the I/O APIC and MSI code when freeing an IDT vector. - Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an IRQ to its irq_rman. The MSI code uses this when it creates new interrupt sources to let the nexus know about newly valid IRQs. Previously the msi_alloc() and msix_alloc() passed some extra stuff back to the nexus methods which then added the IRQs. This approach is a bit cleaner. - Change the MSI sx lock to a mutex. If we need to create new sources, drop the lock, create the required number of sources, then get the lock and try the allocation again.
# 169042	25-Apr-2007	ariff	Disable C1 Enhanced mode on AMD K8 Family Revision F and above to keep local APIC timer alive. Reviewed by: jhb PR: i386/104678 MFC after: 3 days
# 167747	20-Mar-2007	jhb	Add a new apic0 psuedo-device to claim memory resources for the memory address ranges used by local and I/O APICs in the system. Some systems also reserve these ranges as system resources via either PnPBIOS or ACPI, so this device currently attaches after acpi0 and legacy0 so that the system resources are given precedence.
# 167273	06-Mar-2007	jhb	Change the x86 interrupt code to use FreeBSD CPU IDs (i.e. PCPU_GET(cpuid)) rather than local APIC IDs to keep track of CPUs which can handle interrupts.
# 167247	05-Mar-2007	jhb	Use vm_paddr_t rather than uintptr_t when passing the physical address of APICs to lapic_init() and ioapic_create().
# 165302	17-Dec-2006	kmacy	Evidently FreeBSD has long relied on the compiler to treat structures passed by value (trap frames) as if they were in fact being passed by reference. For better or worse, this incorrect behaviour is no longer present in gcc 4.1. In this patch I convert all trapframe arguments to be explicitly pass by reference. I also remove vm86_initflags, pushing the very little work that it actually does up into vm86_prepcall. Reviewed by: kan Tested by: kan
# 164265	13-Nov-2006	jhb	MD support for PCI Message Signalled Interrupts on amd64 and i386: - Add a new apic_alloc_vectors() method to the local APIC support code to allocate N contiguous IDT vectors (aligned on a M >= N boundary). This function is used to allocate IDT vectors for a group of MSI messages. - Add MSI and MSI-X PICs. The PIC code here provides methods to manage edge-triggered MSI messages as x86 interrupt sources. In addition to the PIC methods, msi.c also includes methods to allocate and release MSI and MSI-X messages. For x86, we allow for up to 128 different MSI IRQs starting at IRQ 256 (IRQs 0-15 are reserved for ISA IRQs, 16-254 for APIC PCI IRQs, and IRQ 255 is reserved). - Add pcib_(alloc\|release)_msi[x]() methods to the MD x86 PCI bridge drivers to bubble the request up to the nexus driver. - Add pcib_(alloc\|release)_msi[x]() methods to the x86 nexus drivers that ask the MSI PIC code to allocate resources and IDT vectors. MFC after: 2 months
# 163219	10-Oct-2006	jhb	Change the x86 interrupt code to suspend/resume interrupt controllers (PICs) rather than interrupt sources. This allows interrupt controllers with no interrupt pics (such as the 8259As when APIC is in use) to participate in suspend/resume. - Always register the 8259A PICs even if we don't use any of their pins. - Explicitly reset the 8259As on resume on amd64 if 'device atpic' isn't included. - Add a "dummy" PIC for the local APIC on the BSP to reset the local APIC on resume. This gets suspend/resume working with APIC on UP systems. SMP still needs more work to bring the APs back to life. The MFC after is tentative. Tested by: anholt (i386) Submitted by: Andrea Bittau <a.bittau at cs.ucl.ac.uk> (3) MFC after: 1 week
# 162713	27-Sep-2006	sobomax	Extend comment explaining why code is conditional at !defined(SCHED_ULE). Suggested by: ru
# 162708	27-Sep-2006	sobomax	Since ULE doesn't honor hlt_cpus_mask don't compile code that prevents timer interrupt servicing for disabled HTT cores in ULE case. Should be probably fixed in ULE code instead, but we have no real maintainer for ULE to do it. PR: 103697
# 162233	11-Sep-2006	jhb	Add a new ddb command 'show lapic' to dump details about the local APIC registers for the current CPU. MFC after: 3 days
# 162087	06-Sep-2006	sobomax	Unbreak in the case when device apic is compiled into non-SMP kernel. Reported by: jhay MFC after: 2 weeks
# 162042	05-Sep-2006	sobomax	The FreeBSD by default "disables" hyper-threading cores, by not scheduling any threads to them. However, it still counts those cores as "active but permanently idle" when calculating system-wide CPUs statistics. It is incorrect, since it skews statistics quite a bit and creates real problems for certain types of applications (monitoring applications for example), by making them believe that the system does have enough idle CPU resources, while in fact it does not. Correct the problem by not calling performance counting routines on "disabled" cores. The cleaner solution would be to just disable APIC timer interrupts on those cores completely, but ENOTIME here and it is not clear if the additional complexity really worth minor performance gain. Reviewed by: ssouhlal Sponsored by: Sippy Software, Inc. MFC after: 2 weeks
# 160312	12-Jul-2006	jhb	Simplify the pager support in DDB. Allowing different db commands to install custom pager functions didn't actually happen in practice (they all just used the simple pager and passed in a local quit pointer). So, just hardcode the simple pager as the only pager and make it set a global db_pager_quit flag that db commands can check when the user hits 'q' (or a suitable variant) at the pager prompt. Also, now that it's easy to do so, enable paging by default for all ddb commands. Any command that wishes to honor the quit flag can do so by checking db_pager_quit. Note that the pager can also be effectively disabled by setting $lines to 0. Other fixes: - 'show idt' on i386 and pc98 now actually checks the quit flag and terminates early. - 'show intr' now actually checks the quit flag and terminates early.
# 156920	20-Mar-2006	jhb	Drop some unneeded casts since we program the kernel in C rather than C++.
# 156124	28-Feb-2006	jhb	Rework how we wire up interrupt sources to CPUs: - Throw out all of the logical APIC ID stuff. The Intel docs are somewhat ambiguous, but it seems that the "flat" cluster model we are currently using is only supported on Pentium and P6 family CPUs. The other "hierarchy" cluster model that is supported on all Intel CPUs with local APICs is severely underdocumented. For example, it's not clear if the OS needs to glean the topology of the APIC hierarchy from somewhere (neither ACPI nor MP Table include it) and setup the logical clusters based on the physical hierarchy or not. Not only that, but on certain Intel chipsets, even though there were 4 CPUs in a logical cluster, all the interrupts were only sent to one CPU anyway. - We now bind interrupts to individual CPUs using physical addressing via the local APIC IDs. This code has also moved out of the ioapic PIC driver and into the common interrupt source code so that it can be shared with MSI interrupt sources since MSI is addressed to APICs the same way that I/O APIC pins are. - Interrupt source classes grow a new method pic_assign_cpu() to bind an interrupt source to a specific local APIC ID. - The SMP code now tells the interrupt code which CPUs are avaiable to handle interrupts in a simpler and more intuitive manner. For one thing, it means we could now choose to not route interrupts to HT cores if we wanted to (this code is currently in place in fact, but under an #if 0 for now). - For now we simply do static round-robin of IRQs to CPUs when the first interrupt handler just as before, with the change that IRQs are now bound to individual CPUs rather than groups of up to 4 CPUs. - Because the IRQ to CPU mapping has now been moved up a layer, it would be easier to manage this mapping from higher levels. For example, we could allow drivers to specify a CPU affinity map for their interrupts, or we could allow a userland tool to bind IRQs to specific CPUs. The MFC is tentative, but I want to see if this fixes problems some folks had with UP APIC kernels on 6.0 on SMP machines (an SMP kernel would work fine, but a UP APIC kernel (such as GENERIC in RELENG_6) would lose interrupts). MFC after: 1 week
# 153666	22-Dec-2005	jhb	Tweak how the MD code calls the fooclock() methods some. Instead of passing a pointer to an opaque clockframe structure and requiring the MD code to supply CLKF_FOO() macros to extract needed values out of the opaque structure, just pass the needed values directly. In practice this means passing the pair (usermode, pc) to hardclock() and profclock() and passing the boolean (usermode) to hardclock_cpu() and hardclock_process(). Other details: - Axe clockframe and CLKF_FOO() macros on all architectures. Basically, all the archs were taking a trapframe and converting it into a clockframe one way or another. Now they can just extract the PC and usermode values directly out of the trapframe and pass it to fooclock(). - Renamed hardclock_process() to hardclock_cpu() as the latter is more accurate. - On Alpha, we now run profclock() at hz (profhz == hz) rather than at the slower stathz. - On Alpha, for the TurboLaser machines that don't have an 8254 timecounter, call hardclock() directly. This removes an extra conditional check from every clock interrupt on Alpha on the BSP. There is probably room for even further pruning here by changing Alpha to use the simplified timecounter we use on x86 with the lapic timer since we don't get interrupts from the 8254 on Alpha anyway. - On x86, clkintr() shouldn't ever be called now unless using_lapic_timer is false, so add a KASSERT() to that affect and remove a condition to slightly optimize the non-lapic case. - Change prototypeof arm_handler_execute() so that it's first arg is a trapframe pointer rather than a void pointer for clarity. - Use KCOUNT macro in profclock() to lookup the kernel profiling bucket. Tested on: alpha, amd64, arm, i386, ia64, sparc64 Reviewed by: bde (mostly)
# 153383	13-Dec-2005	jhb	Revert previous commit. The BIOS braindamage is even worse than I originally thought. The BIOS that cleared CPUID_APIC actually managed to disable the local APIC entirely and even Windows 64 doesn't boot on it. Reported by: bz
# 153377	13-Dec-2005	jhb	Don't check the CPUID_APIC bit in the cpu_features flags field to determine if the boot CPU has a local APIC because some BIOS vendors are not competent enough to set this bit. Instead, just assume that we always have a local APIC on amd64. For i386 the check is a bit more subtle. FreeBSD requires either an MP Table or an ACPI MADT table to enumerate APICs. The only systems that have one of those tables that don't have local APICs are some presumably rare (and old) SMP 486 systems using external APICs. Thus, instead of checking the CPUID_APIC flag, check the CPU class and abort if we are running on a 486. MFC after: 1 week Reported by: bz
# 153146	05-Dec-2005	jhb	Change the i386 code to pass the interrupt vector as a separate argument rather than embedding it in the intrframe as if_vec. This reduces diffs with amd64 somewhat. - Remove cf_vec from clockframe (it wasn't used anyway) and stop pushing dummy vector arguments for ipi_bitmap_handler() and lapic_handle_timer() since clockframe == trapframe now. - Fix ddb to handle stack traces across interrupt entry points that just have a trapframe on their stack and not a trapframe + vector. - Change intr_execute_handlers() to take a trapframe rather than an intrframe pointer. - Change lapic_handle_intr() and atpic_handle_intr() to take a vector and trapframe rather than an intrframe. - GC struct intrframe now that nothing uses it anymore. - GC CLOCK_TO_TRAPFRAME() and INTR_TO_TRAPFRAME(). Reviewed by: bde Requested by: peter
# 153141	05-Dec-2005	jhb	- Move the code to deal with handling an IPI_STOP IPI out of ipi_nmi_handler() and into a new cpustop_handler() function. Change the Xcpustop IPI_STOP handler to call this function instead of duplicating all the same logic in assembly. - EOI the local APIC for the lapic timer interrupt in C rather than assembly. - Bump the lazypmap IPI counter if COUNT_IPIS is defined in C rather than assembly.
# 151979	02-Nov-2005	jhb	Change the x86 code to allocate IDT vectors on-demand when an interrupt source is first enabled similar to how intr_event's now allocate ithreads on-demand. Previously, we would map IDT vectors 1:1 to IRQs. Since we only have 191 available IDT vectors for I/O interrupts, this limited us to only supporting IRQs 0-190 corresponding to the first 190 I/O APIC intpins. On many machines, however, each PCI-X bus has its own APIC even though it only has 1 or 2 devices, thus, we were reserving between 24 and 32 IRQs just for 1 or 2 devices and thus 24 or 32 IDT vectors. With this change, a machine with 100 IRQs but only 5 in use will only use up 5 IDT vectors. Also, this change provides an API (apic_alloc_vector() and apic_free_vector()) that will allow a future MSI interrupt source driver to request IDT vectors for use by MSI interrupts on x86 machines. Tested on: amd64, i386
# 150696	28-Sep-2005	jhb	Rename the lapic timer interrupt counters from lapicX: timer to cpuX: timer since it's not always obvious that lapic == cpu. MFC after: 3 days
# 150176	15-Sep-2005	jhb	- Adjust a comment, we do program the performance counter LVT entry now if hwpmc(4) is included. - Don't recursively panic if we are unable to send an IPI, just bail and hope for the best. MFC after: 1 week
# 147565	23-Jun-2005	peter	Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit being in opt_global.h and forcing a global recompile when only a few files reference it. Approved by: re
# 145256	19-Apr-2005	jkoshy	Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities and documentation into -CURRENT. Bump FreeBSD_version. Reviewed by: alc, jhb (kernel changes)
# 145055	14-Apr-2005	jhb	Always use the local APIC timer, even on UP machines.
# 143034	02-Mar-2005	jhb	Tweak the lapic timer code to get the performance closer to the pre-lapic timer case: - Remove the virtual fooclock interrupt counters as they have served their purpose. - Adjust the dividers for the different clock such that profhz is now a multiple of stathz as in the non-lapic case, and the timer now runs at hz * 2 rather than hz * 3. With the new divisors, the default clock rates are: kern.clockrate: { hz = 1000, tick = 1000, profhz = 666, stathz = 133 }
# 141538	08-Feb-2005	jhb	Use the local APIC timer to drive the various kernel clocks on SMP machines rather than forwarding interrupts from the clock devices around using IPIs: - Add an IDT vector that pushes a clock frame and calls lapic_handle_timer(). - Add functions to program the local APIC timer including setting the divisor, and setting up the timer to either down a periodic countdown or one-shot countdown. - Add a lapic_setup_clock() function that the BSP calls from cpu_init_clocks() to setup the local APIC timer if it is going to be used. The setup uses a one-shot countdown to calibrate the timer. We then program the timer on each CPU to fire at a frequency of hz * 3. stathz is defined as freq / 23 (hz * 3 / 23), and profhz is defined as freq / 2 (hz * 3 / 2). This gives the clocks relatively prime divisors while keeping a low LCM for the frequency of the clock interrupts. Thanks to Peter Jeremy for suggesting this approach. - Remove the hardclock and statclock forwarding code including the two associated IPIs. The bitmap IPI handler has now effectively degenerated to just IPI_AST. - When the local APIC timer is used we don't turn the RTC on at all, but we still enable interrupts on the ISA timer 0 (i8254) for timecounting purposes.
# 140254	14-Jan-2005	jhb	Drop the 'active-' prefix from the polarity printf to be consistent with the rest of the interrupt code.
# 139245	23-Dec-2004	jhb	- Give the timer, thermal, and error LVT entries an interrupt vector even though these aren't used yet. - Add missing function prototypes for some static functions. - Allow lvt_mode() to handle an LVT entry with a delivery mode of fixed. - Consolidate code duplicated in lapic_init() and lapic_setup() to program the spurious vector register of a local APIC in a static lapic_enable() function. - Dump the timer, thermal, error, and performance counter LVT entries during lapic_dump(). - Program LVT pins (currently only LINT0 and LINT1) after the local APIC has been software enabled via lapic_enable() since otherwise the LVT programming will not be able to unmask LVT sources.
# 139240	23-Dec-2004	jhb	- Add a function to set the Task Priority Register (TPR) of the local APIC. Currently this is only used to initiailize the TPR to 0 during initial setup. - Reallocate vectors for the local APIC timer, error, and thermal LVT entries. The timer entry is allocated from the top of the I/O interrupt range reducing the number of vectors available for hardware interrupts to 191. Linux happens to use the same exact vector for its timer interrupt as well. If the timer vector shared the same priority queue as the IPI handlers, then the frequency that the timer vector will eventually be firing at can interact badly with the IPIs resulting in the queue filling and the dreaded IPI stuck panics, hence it being located at the top of the previous priority queue instead. - Fixup various minor nits in comments.
# 132156	14-Jul-2004	jhb	Correct bounds check in lapic_create(). Submitted by: "Ted Unangst" tedu at coverity.com
# 128930	04-May-2004	jhb	- Change the APIC code to mostly use the recently added intr_trigger and intr_polarity enums for passing around interrupt trigger modes and polarity rather than using the magic numbers 0 for level/low and 1 for edge/high. - Convert the mptable parsing code to use the new ELCR wrapper code rather than reading the ELCR directly. Also, use the ELCR settings to control both the trigger and polarity of EISA IRQs instead of just the trigger mode. - Rework the MADT's handling of the ACPI SCI again: - If no override entry for the SCI exists at all, use level/low trigger instead of the default edge/high used for ISA IRQs. - For the ACPI SCI, use level/low values for conforming trigger and polarity rather than the edge/high values we use for all other ISA IRQs. - Rework the tunables available to override the MADT. The hw.acpi.force_sci_lo tunable is no longer supported. Instead, there are now two tunables that can independently override the trigger mode and/or polarity of the SCI. The hw.acpi.sci.trigger tunable can be set to either "edge" or "level", and the hw.acpi.sci.polarity tunable can be set to either "high" or "low". To simulate hw.acpi.force_sci_lo, set hw.acpi.sci.trigger to "level" and hw.acpi.sci.polarity to "low". If you are having problems with ACPI either causing an interrupt storm or not working at all (e.g., the power button doesn't turn invoke a shutdown -p now), you can try tweaking these two tunables to find the combination that works.
# 125317	02-Feb-2004	jeff	- Make sure the apic is idle before sending an IPI. This is required on non-X-APIC machines. Previously this was only done in the DETECT_DEADLOCK case when really it is needed in all cases. Reminded by: jhb
# 124945	25-Jan-2004	jeff	- Don't define DETECT_DEADLOCK. I don't know that this code has detected a deadlock in several years. Furthermore, the IPI code is currently protected by a seperate spinlock. This only served to make IPIs twice as expensive as they had to be which severely slowed down the IPI heavy ULE scheduler.
# 123432	11-Dec-2003	jeff	- Call mp_topology() after all CPUs have been probed.
# 123133	03-Dec-2003	jhb	- Reorder the APIC enumerator SYSINIT's to register enumeators at SI_SUB_CPU - 1 and probe enumerators, probe CPUs, and setup the local APIC programming all at SI_SUB_CPU / SI_ORDER_FIRST. This is needed to help get the ACPI module working again as it moves the APIC enumeration code after SI_SUB_KLD. - In the MADT parser, use mp_maxid rather than MAXCPU to terminate a loop when assigning per-cpu ACPI IDs to avoid a dependency on 'options SMP'. - Allow the apic device to be disabled via 'hint.apic.0.disabled' from the loader. Note that since this is done in the local APIC code, it works for both the ACPI and non-ACPI cases. Approved by: re (scott / blanket)
# 122690	14-Nov-2003	jhb	Shuffle the APIC interrupt vectors around a bit: - Move the IPI and local APIC interrupt vectors up into the 0xf0 - 0xff range. The pmap lazyfix IPI was reordered down next to the TLB shootdowns to avoid conflicting with the spurious interrupt vector. - Move the base of APIC interrupts up 16 so that the first 16 APIC interrupts do not overlap the vectors used by the ATPIC. - Remove bogus interrupt vector reservations for LINT[01]. - Now that 0xc0 - 0xef are available, use them for device interrupts. This increases the number of APIC device interrupts to 191. - Increase the system-wide number of global interrupts to 191 to catch up to more APIC interrupts. Requested by: peter (2)
# 122572	12-Nov-2003	jhb	- Move manipulation of td_intr_nesting_level out of assembly interrupt vector stubs and into the C functions they call. - Move disabling and EOIing of interrupt sources out of PIC driver entry points and into intr_execute_handlers(). Intr_execute_handlers() only disables a source for an interrupt if it is a stray interrupt or has threaded handlers. Sources with fast handlers no longer disable (mask) the source while executing the handlers. - Move the setting of clkintr_pending into intr_execute_handlers() and set the variable for any interrupt source with a vector of 0. (Should only be true for IRQ 0.) This fixes clkintr_pending in the NO_MIXED_MODE case. - Implement lapic_eoi() and use it to implement ioapic_eoi_source(). - Rename atpic_sched_ithd() to atpic_handle_intr() since it is used to handle all atpic interrupts and not just threaded ones. Inspired by: peter's changes to amd64 in p4 (1) Requested by: bde (2)
# 121986	03-Nov-2003	jhb	New APIC support code: - The apic interrupt entry points have been rewritten so that each entry point can serve 32 different vectors. When the entry is executed, it uses one of the 32-bit ISR registers to determine which vector in its assigned range was triggered. Thus, the apic code can support 159 different interrupt vectors with only 5 entry points. - We now always to disable the local APIC to work around an errata in certain PPros and then re-enable it again if we decide to use the APICs to route interrupts. - We no longer map IO APICs or local APICs using special page table entries. Instead, we just use pmap_mapdev(). We also no longer export the virtual address of the local APIC as a global symbol to the rest of the system, but only in local_apic.c. To aid this, the APIC ID of each CPU is exported as a per-CPU variable. - Interrupt sources are provided for each intpin on each IO APIC. Currently, each source is given a unique interrupt vector meaning that PCI interrupts are not shared on most machines with an I/O APIC. That mapping for interrupt sources to interrupt vectors is up to the APIC enumerator driver however. - We no longer probe to see if we need to use mixed mode to route IRQ 0, instead we always use mixed mode to route IRQ 0 for now. This can be disabled via the 'NO_MIXED_MODE' kernel option. - The npx(4) driver now always probes to see if a built-in FPU is present since this test can now be performed with the new APIC code. However, an SMP kernel will panic if there is more than one CPU and a built-in FPU is not found. - PCI interrupts are now properly routed when using APICs to route interrupts, so remove the hack to psuedo-route interrupts when the intpin register was read. - The apic.h header was moved to apicreg.h and a new apicvar.h header that declares the APIs used by the new APIC code was added.