Cross Reference: /freebsd-9.3-release/sys/ia64/ia64/interrupt.c

History log of /freebsd-9.3-release/sys/ia64/ia64/interrupt.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 267654	19-Jun-2014	gjb	Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-9.3-release
# 225736	22-Sep-2011	kensmith	Copy head to stable/9 as part of 9.0-RELEASE release cycle. Approved by: re (implicit)
# 224114	16-Jul-2011	marcel	Don't send EOI to the CPU before we handled the interrupt. This could potentially trigger multiple pending interrupts for level-sensitive interrupts. However, the event timer interrupt does need EOI before being handled to avoid missing clock events. These conflicting requirements are handled by having the XIV handler inform the dispatch code whether or not it send EOI to the CPU. If not, the dispatch code will do it. This allows handlers to send EOI before doing potentially long-running activities, while still have a sensible default behaviour.
# 223529	25-Jun-2011	marcel	Replace the original copyright notice with my own. Everything in this file is written by me and has no bearing on the initial or original version.
# 223526	25-Jun-2011	marcel	Switch to the event timers infrastructure. This includes: o Setting td_intr_frame to the XIVs trap frame because it's referenced by the ET event handler. o Signal EOI to the CPU before calling the registered XIV handlers. This prevents lost ITC interrupts, which cause starvation in one-shot mode. o Adding support for IPI_HARDCLOCK with corresponding per-CPU counters. o Have the APs call cpu_initclocks() so as to limited the scattering of clock related initialization. cpu_initclocks() calls the <self>_bsp() or <self>_ap() version accordingly. o Uncomment the ET clock handling in cpu_idle(). o Update the DDB 'show pcpu' output for the new MD fields. o Entirely rewritten ia64_ih_clock(). Note that we don't create as many clock XIVs as we have CPUs, as is done on PowerPC. It doesn't scale. We can only have 240 XIVs and we can have more CPUs than that. There's a single intrcnt index for the cumulative clock ticks and we keep per CPU counts in the PCPU stats structure. o Register the ITC by hooking SI_SUB_CONFIGURE (2nd order). Open issues: o Clock interrupts can still be lost. Some tweaking is still necessary. Thanks to: mav@ for his support, feedback and explanations. ET stats while committing: eris% sysctl machdep.cpu \| grep nclks machdep.cpu.0.nclks: 24007 machdep.cpu.1.nclks: 22895 machdep.cpu.2.nclks: 13523 machdep.cpu.3.nclks: 9342 machdep.cpu.4.nclks: 9103 machdep.cpu.5.nclks: 9298 machdep.cpu.6.nclks: 10039 machdep.cpu.7.nclks: 9479 eris% vmstat -i \| grep clock clock 108599 50
# 205726	27-Mar-2010	marcel	Implement interrupt to CPU binding. Assign interrupts to CPUs in a round-robin fashion, starting with the highest priority interrupt on the highest-numbered CPU and cycling downwards.
# 205713	26-Mar-2010	marcel	Rename disable_intr() to ia64_disable_intr() and rename enable_intr() to ia64_enable_intr(). This reduces confusion with intr_disable() and intr_restore(). Have configure_final() call ia64_finalize_intr() instead of enable_intr() in preparation of adding support for binding interrupts to all CPUs.
# 205433	22-Mar-2010	marcel	Fix interrupt handling by extending the critical region so that preemption doesn't happen until after all pending interrupt have been services. While here again, simplify the EOI handling by doing it after we call the XIV-specific handlers, rather than in each of them. The original thought was that we may want to do an EOI first and the actual IPI handling next, but that's mostly a micro-optimization.
# 205234	16-Mar-2010	marcel	Revamp the interrupt code based on the previous commit: o Introduce XIV, eXternal Interrupt Vector, to differentiate from the interrupts vectors that are offsets in the IVT (Interrupt Vector Table). There's a vector for external interrupts, which are based on the XIVs. o Keep track of allocated and reserved XIVs so that we can assign XIVs without hardcoding anything. When XIVs are allocated, an interrupt handler and a class is specified for the XIV. Classes are: 1. architecture-defined: XIV 15 is returned when no external interrupt are pending, 2. platform-defined: SAL reports which XIV is used to wakeup an AP (typically 0xFF, but it's 0x12 for the Altix 350). 3. inter-processor interrupts: allocated for SMP support and non-redirectable. 4. device interrupts (i.e. IRQs): allocated when devices are discovered and are redirectable. o Rewrite the central interrupt handler to call the per-XIV interrupt handler and rename it to ia64_handle_intr(). Move the per-XIV handler implementation to the file where we have the XIV allocation/reservation. Clock interrupt handling is moved to clock.c. IPI handling is moved to mp_machdep.c. o Drop support for the Intel 8259A because it was broken. When XIV 0 is received, the CPU should initiate an INTA cycle to obtain the interrupt vector of the 8259-based interrupt. In these cases the interrupt controller we should be talking to WRT to masking on signalling EOI is the 8259 and not the I/O SAPIC. This requires adriver for the Intel 8259A which isn't available for ia64. Thus stop pretending to support ExtINTs and instead panic() so that if we come across hardware that has an Intel 8259A, so have something real to work with. o With XIVs for IPIs dynamically allocatedi and also based on priority, define the IPI_* symbols as variables rather than constants. The variable holds the XIV allocated for the IPI. o IPI_STOP_HARD delivers a NMI if possible. Otherwise the XIV assigned to IPI_STOP is delivered.
# 204425	27-Feb-2010	marcel	Interrupt related cleanups: o Assign vectors based on priority, because vectors have implied priority in hardware. o Use unordered memory accesses to the I/O sapic and use the acceptance form of the mf instruction. o Remove the sapicreg.h and sapicvar.h headers. All definitions in sapicreg.h are private to sapic.c and all definitions in sapicvar.h are either private or interface functions. Move the interface functions to intr.h. o Hide the definition of struct sapic.
# 203883	14-Feb-2010	marcel	Some code churn: o Eliminate IA64_PHYS_TO_RR6 and change all places where the macro is used by calling either bus_space_map() or pmap_mapdev(). o Implement bus_space_map() in terms of pmap_mapdev() and implement bus_space_unmap() in terms of pmap_unmapdev(). o Have ia64_pib hold the uncached virtual address of the processor interrupt block throughout the kernel's life and access the elements of the PIB through this structure pointer. This is a non-functional change with the exception of using ia64_ld1() and ia64_st8() to write to the PIB. We were still using assignments, for which the compiler generates semaphore reads -- which cause undefined behaviour for uncacheable memory. Note also that the memory barriers in ipi_send() are critical for proper functioning. With all the mapping of uncached memory done by pmap_mapdev(), we can keep track of the translations and wire them in the CPU. This then eliminates the need to reserve a whole region for uncached I/O and it eliminates translation traps for device I/O accesses.
# 200207	07-Dec-2009	marcel	Define struct pcpu_md as the only MD field of struct pcpu (pc_acpi_id excluded, as it's used by MI code) and mode the sysctl variables from pcpu_stats to pcpu_md. Adjust all references accordingly. While nearby, change the PCPU sysctl tree so that they match the CPU device sysctl tree -- they are now children of a static node called "machdep.cpu" and are named only with their cpu ID.
# 199893	28-Nov-2009	marcel	Eliminate teh use of MAXCPU in static arrays of interrupt counters by adding statistics counters to the PCPU structure. Export the counters through sysctl by giving each PCPU structure its own sysctl context. While here, fix cnt.v_intr by not just having it count clock interrupts, but every interrupt and add more counters for each interrupt source.
# 199727	23-Nov-2009	marcel	Improve upon revision 196196 by removing the newly added comment in the wrong place and instead add a KASSERT in the right place.
# 198733	31-Oct-2009	marcel	Reimplement the lazy FP context switching: o Move all code into a single file for easier maintenance. o Use a single global lock to avoid having to handle either multiple locks or race conditions. o Make sure to disable the high FP registers after saving or dropping them. o use msleep() to wait for the other CPU to save the high FP registers. This change fixes the high FP inconsistency panics. A single global lock typically serializes too much, which may be noticable when a lot of threads use the high FP registers, but in that case it's probably better to switch the high FP context synchronuously. Put differently: cpu_switch() should switch the high FP registers if the incoming and outgoing threads both use the high FP registers.
# 196196	13-Aug-2009	attilio	* Completely Remove the option STOP_NMI from the kernel. This option has proven to have a good effect when entering KDB by using a NMI, but it completely violates all the good rules about interrupts disabled while holding a spinlock in other occasions. This can be the cause of deadlocks on events where a normal IPI_STOP is expected. * Adds an new IPI called IPI_STOP_HARD on all the supported architectures. This IPI is responsible for sending a stop message among CPUs using a privileged channel when disponible. In other cases it just does match a normal IPI_STOP. Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64 architectures, while on the other has a normal IPI_STOP effect. It is responsibility of maintainers to eventually implement an hard stop when necessary and possible. * Use the new IPI facility in order to implement a new userend SMP kernel function called stop_cpus_hard(). That is specular to stop_cpu() but it does use the privileged channel for the stopping facility. * Let KDB use the newly introduced function stop_cpus_hard() and leave stop_cpus() for all the other cases * Disable interrupts on CPU0 when starting the process of APs suspension. * Style cleanup and comments adding This patch should fix the reboot/shutdown deadlocks many users are constantly reporting on mailing lists. Please don't forget to update your config file with the STOP_NMI option removal Reviewed by: jhb Tested by: pho, bz, rink Approved by: re (kib)
# 183439	28-Sep-2008	marius	Remove ipi_all() and ipi_self() as the former hasn't been used at all to date and the latter also is only used in ia64 and powerpc code which no longer serves a real purpose after bring-up and just can be removed as well. Note that architectures like sun4u also provide no means of implementing IPI'ing a CPU itself natively in the first place. Suggested by: jhb Reviewed by: arch, grehan, jhb
# 179256	23-May-2008	marcel	Account for IPI_PREEMPT. We don't want to call sched_preempt() with interrupts disabled or with td_intr_nesting_level non-zero.
# 178215	15-Apr-2008	marcel	Support and switch to the ULE scheduler: o Implement IPI_PREEMPT, o Set td_lock for the thread being switched out, o For ULE & SMP, loop while td_lock points to blocked_lock for the thread being switched in, o Enable ULE by default in GENERIC and SKI,
# 178131	11-Apr-2008	jeff	- Pass the irq and not the vector to intr_event_create(). Reviewed by: marcel
# 178092	11-Apr-2008	jeff	- Add the interrupt vector number to intr_event_create so MI code can lookup hard interrupt events by number. Ignore the irq# for soft intrs. - Add support to cpuset for binding hardware interrupts. This has the side effect of binding any ithread associated with the hard interrupt. As per restrictions imposed by MD code we can only bind interrupts to a single cpu presently. Interrupts can be 'unbound' by binding them to all cpus. Reviewed by: jhb Sponsored by: Nokia
# 177940	05-Apr-2008	jhb	Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt code. - Rename the intr_event 'eoi', 'disable', and 'enable' hooks to 'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric. Also, add a comment describe what the MI code expects them to do. - On amd64, i386, and powerpc this is effectively a NOP. - On arm, don't bother masking the interrupt unless the ithread is scheduled in the non-INTR_FILTER case to match what INTR_FILTER did. Also, don't bother unmasking the interrupt in the post_filter case if we never masked it. The INTR_FILTER case had been doing this by having arm_unmask_irq for the post_filter (formerly 'eoi') hook. - On ia64, stray interrupts are now masked for the non-INTR_FILTER case. They were already masked in the INTR_FILTER case. - On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for both the 'post_filter' and 'post_ithread' hooks to match what the non-INTR_FILTER code did. - On sun4v, retire the ithread wrapper hack by using an appropriate 'post_ithread' hook instead (it's what 'post_ithread'/'enable' was designed to do even in 5.x). Glanced at by: piso Reviewed by: marius Requested by: marius [1], [5] Tested on: amd64, i386, arm, sparc64
# 177325	17-Mar-2008	jhb	Simplify the interrupt code a bit: - Always include the ie_disable and ie_eoi methods in 'struct intr_event' and collapse down to one intr_event_create() routine. The disable and eoi hooks simply aren't used currently in the !INTR_FILTER case. - Expand 'disab' to 'disable' in a few places. - Use function casts for arm and i386:intr_eoi_src() instead of wrapper routines since to trim one extra indirection. Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER} Tested on: {amd64,i386} x {FILTER, !FILTER}
# 177181	14-Mar-2008	jhb	Add preliminary support for binding interrupts to CPUs: - Add a new intr_event method ie_assign_cpu() that is invoked when the MI code wishes to bind an interrupt source to an individual CPU. The MD code may reject the binding with an error. If an assign_cpu function is not provided, then the kernel assumes the platform does not support binding interrupts to CPUs and fails all requests to do so. - Bind ithreads to CPUs on their next execution loop once an interrupt event is bound to a CPU. Only shared ithreads are bound. We currently leave private ithreads for drivers using filters + ithreads in the INTR_FILTER case unbound. - A new intr_event_bind() routine is used to bind an interrupt event to a CPU. - Implement binding on amd64 and i386 by way of the existing pic_assign_cpu PIC method. - For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up an interrupt source and binds its interrupt event to the specified CPU. MI code can currently (ab)use this by doing: intr_bind(rman_get_start(irq_res), cpu); however, I plan to add a truly MI interface (probably a bus_bind_intr(9)) where the implementation in the x86 nexus(4) driver would end up calling intr_bind() internally. Requested by: kmacy, gallatin, jeff Tested on: {amd64, i386} x {regular, INTR_FILTER}
# 173799	21-Nov-2007	scottl	Extend critical section coverage in the low-level interrupt handlers to include the ithread scheduling step. Without this, a preemption might occur in between the interrupt getting masked and the ithread getting scheduled. Since the interrupt handler runs in the context of curthread, the scheudler might see it as having a such a low priority on a busy system that it doesn't get to run for a _long_ time, leaving the interrupt stranded in a disabled state. The only way that the preemption can happen is by a fast/filter handler triggering a schduling event earlier in the handler, so this problem can only happen for cases where an interrupt is being shared by both a fast/filter handler and an ithread handler. Unfortunately, it seems to be common for this sharing to happen with network and USB devices, for example. This fixes many of the mysterious TCP session timeouts and NIC watchdogs that were being reported. Many thanks to Sam Lefler for getting to the bottom of this problem. Reviewed by: jhb, jeff, silby
# 171739	06-Aug-2007	marcel	Keep interrupts disabled while handling external interrupts. There's no advantage in allowing nested external interrupts. In fact, it leads to a potential stack overrun. While here, put the interrupt vector in the trapframe, so as to compensate for the 36 cycle latency of reading cr.ivr. Further simplify assembly code by dealing with ASTs from C. Approved by: re (blanket)
# 171664	30-Jul-2007	marcel	Rework the interrupt code and add support for interrupt filtering (INTR_FILTER). This includes: o Save a pointer to the sapic structure and IRQ for every vector, so that we can quickly EOI, mask and unmask the interrupt. o Add locking to the sapic code now that we can reprogram a sapic on multiple CPUs at the same time. o Use u_int for the vector and IRQ. We only have 256 vectors, so using a 64-bit type for it is rather excessive. o Properly handle concurrent registration of a handler for the same vector. Since vectors have a corresponding priority, we should not map IRQs to vectors in a linear fashion, but rather pick a vector that has a priority in line with the interrupt type. This is left for later. The vector/IRQ interchange has been untangled as much as possible to make this easier. Approved by: re (blacket)
# 170291	04-Jun-2007	attilio	Rework the PCPU_* (MD) interface: - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value. Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days. Reviewed by: alc, bde Approved by: jeff (mentor)
# 170162	31-May-2007	piso	In some particular cases (like in pccard and pccbb), the real device handler is wrapped in a couple of functions - a filter wrapper and an ithread wrapper. In this case (and just in this case), the filter wrapper could ask the system to schedule the ithread and mask the interrupt source if the wrapped handler is composed of just an ithread handler: modify the "old" interrupt code to make it support this situation, while the "new" interrupt code is already ok. Discussed with: jhb
# 166901	23-Feb-2007	piso	o break newbus api: add a new argument of type driver_filter_t to bus_setup_intr() o add an int return code to all fast handlers o retire INTR_FAST/IH_FAST For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current Reviewed by: many Approved by: re@
# 164392	18-Nov-2006	marcel	Now that printf() needs the PCPU, set it up before we call printf(). Change the pc_pcb field from a pointer to struct pcb to struct pcb so that sizeof(struct pcb) includes the PCB we use for IPI_STOP. Statically declare early_pcb so that we don't have to allocate the PCB for thread0. This way we can setup the PCPU before cninit() and thus before we use printf().
# 157449	03-Apr-2006	marcel	Improve handling of IPI_STOP: o use atomic operations to fiddle with stopped_cpus and started_cpus. o disable interrupts while we're waiting to be started. o remove logic relating to cpustop_restartfunc as it's not used.
# 153666	22-Dec-2005	jhb	Tweak how the MD code calls the fooclock() methods some. Instead of passing a pointer to an opaque clockframe structure and requiring the MD code to supply CLKF_FOO() macros to extract needed values out of the opaque structure, just pass the needed values directly. In practice this means passing the pair (usermode, pc) to hardclock() and profclock() and passing the boolean (usermode) to hardclock_cpu() and hardclock_process(). Other details: - Axe clockframe and CLKF_FOO() macros on all architectures. Basically, all the archs were taking a trapframe and converting it into a clockframe one way or another. Now they can just extract the PC and usermode values directly out of the trapframe and pass it to fooclock(). - Renamed hardclock_process() to hardclock_cpu() as the latter is more accurate. - On Alpha, we now run profclock() at hz (profhz == hz) rather than at the slower stathz. - On Alpha, for the TurboLaser machines that don't have an 8254 timecounter, call hardclock() directly. This removes an extra conditional check from every clock interrupt on Alpha on the BSP. There is probably room for even further pruning here by changing Alpha to use the simplified timecounter we use on x86 with the lapic timer since we don't get interrupts from the 8254 on Alpha anyway. - On x86, clkintr() shouldn't ever be called now unless using_lapic_timer is false, so add a KASSERT() to that affect and remove a condition to slightly optimize the non-lapic case. - Change prototypeof arm_handler_execute() so that it's first arg is a trapframe pointer rather than a void pointer for clarity. - Use KCOUNT macro in profclock() to lookup the kernel profiling bucket. Tested on: alpha, amd64, arm, i386, ia64, sparc64 Reviewed by: bde (mostly)
# 151885	30-Oct-2005	marcel	Remove a stray return statement in the interrupt dispatch function that caused a premature exit after calling a fast interrupt handler and bypassing a much needed critical_exit() and the scheduling of the interrupt thread for non-fast handlers. In short: unbreak :-)
# 151658	25-Oct-2005	jhb	Reorganize the interrupt handling code a bit to make a few things cleaner and increase flexibility to allow various different approaches to be tried in the future. - Split struct ithd up into two pieces. struct intr_event holds the list of interrupt handlers associated with interrupt sources. struct intr_thread contains the data relative to an interrupt thread. Currently we still provide a 1:1 relationship of events to threads with the exception that events only have an associated thread if there is at least one threaded interrupt handler attached to the event. This means that on x86 we no longer have 4 bazillion interrupt threads with no handlers. It also means that interrupt events with only INTR_FAST handlers no longer have an associated thread either. - Renamed struct intrhand to struct intr_handler to follow the struct intr_foo naming convention. This did require renaming the powerpc MD struct intr_handler to struct ppc_intr_handler. - INTR_FAST no longer implies INTR_EXCL on all architectures except for powerpc. This means that multiple INTR_FAST handlers can attach to the same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach to the same interrupt. Sharing INTR_FAST handlers may not always be desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun either. Drivers can always still use INTR_EXCL to ask for an interrupt exclusively. The way this sharing works is that when an interrupt comes in, all the INTR_FAST handlers are executed first, and if any threaded handlers exist, the interrupt thread is scheduled afterwards. This type of layout also makes it possible to investigate using interrupt filters ala OS X where the filter determines whether or not its companion threaded handler should run. - Aside from the INTR_FAST changes above, the impact on MD interrupt code is mostly just 's/ithread/intr_event/'. - A new MI ddb command 'show intrs' walks the list of interrupt events dumping their state. It also has a '/v' verbose switch which dumps info about all of the handlers attached to each event. - We currently don't destroy an interrupt thread when the last threaded handler is removed because it would suck for things like ppbus(8)'s braindead behavior. The code is present, though, it is just under #if 0 for now. - Move the code to actually execute the threaded handlers for an interrrupt event into a separate function so that ithread_loop() becomes more readable. Previously this code was all in the middle of ithread_loop() and indented halfway across the screen. - Made struct intr_thread private to kern_intr.c and replaced td_ithd with a thread private flag TDP_ITHREAD. - In statclock, check curthread against idlethread directly rather than curthread's proc against idlethread's proc. (Not really related to intr changes) Tested on: alpha, amd64, i386, sparc64 Tested on: arm, ia64 (older version of patch by cognet and marcel)
# 149915	09-Sep-2005	marcel	Change the High FP lock from a sleep lock to a spin lock. We can take the lock from interrupt context, which causes an implicit lock order reversal. We've been using the lock carefully enough that making it a spin lock should not be harmful.
# 148807	06-Aug-2005	marcel	Improve SMP support: o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU uses to look up translations it can't find in the TLB. As such, the VHPT serves as a level 1 cache (the TLB being a level 0 cache) and best results are obtained when it's not shared between CPUs. The collision chain (i.e. the hash bucket) is shared between CPUs, as all buckets together constitute our collection of PTEs. To achieve this, the collision chain does not point to the first PTE in the list anymore, but to a hash bucket head structure. The head structure contains the pointer to the first PTE in the list, as well as a mutex to lock the bucket. Thus, each bucket is locked independently of each other. With at least 1024 buckets in the VHPT, this provides for sufficiently finei-grained locking to make the ssolution scalable to large SMP machines. o Add synchronisation to the lazy FP context switching. We do this with a seperate per-thread lock. On SMP machines the lazy high FP context switching without synchronisation caused inconsistent state, which resulted in a panic. Since the use of the high FP registers is not common, it's possible that races exist. The ia64 package build has proven to be a good stress test, so this will get plenty of exercise in the near future. o Don't use the local ID of the processor we want to send the IPI to as the argument to ipi_send(). use the struct pcpu pointer instead. The reason for this is that IPI delivery is unreliable. It has been observed that sending an IPI to a CPU causes it to receive a stray external interrupt. As such, we need a way to make the delivery reliable. The intended solution is to queue requests in the target CPU's per-CPU structure and use a single IPI to inform the CPU that there's a new entry in the queue. If that IPI gets lost, the CPU can check it's queue at any convenient time (such as for each clock interrupt). This also allows us to send requests to a CPU without interrupting it, if such would be beneficial. With these changes SMP is almost working. There are still some random process crashes and the machine can hang due to having the IPI lost that deals with the high FP context switch. The overhead of introducing the hash bucket head structure results in a performance degradation of about 1% for UP (extra pointer indirection). This is surprisingly small and is offset by gaining reasonably/good scalable SMP support.
# 144971	12-Apr-2005	jhb	Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic operations in some places and simple non-per CPU math in others.
# 144962	12-Apr-2005	marcel	Dot the i's: 1 Move the debug.clock_adjust_* sysctls to debug.clock.adjust_* to make it easier to get only the clock statistics. 2 Make the sysctls read-only [suggested by Marius]. 3 When determining the new clock adjustment, we checked for an error either larger than 12.5% or smaller than 12.5%. We left out an error of exactly 12.5%. For errors larger than 12.5% we adjust the clock reload value in such a way that the next clock interrupt would be early (as in premature). For errors less than 12.5% we stopped the adjustment. The current algorithm doesn't benefit from excluding an error of exactly 12.5%. Change the code to stop adjusting the clock if the error is not larger than 12.5% [suggested by Marius]. Discussed with: marius@
# 139790	06-Jan-2005	imp	/* -> /*- for copyright notices, minor format tweaks as necessary
# 131481	02-Jul-2004	jhb	Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)
# 129022	07-May-2004	marcel	Make sure to sanitize the FP status register. Specifically this masks all FP traps, which should not happen in the kernel.
# 124737	20-Jan-2004	marcel	s/framep/tf/g -- this normalizes on the use of tf to point to the trapframe and improves grep-ability.
# 122841	17-Nov-2003	peter	Widen the enable/disable helper function's argument in line with the ithread_create() changes etc. This should be mostly a NOP.
# 122518	11-Nov-2003	marcel	Further work-out the handling of the high FP registers. The most important change is in cpu_switch() where we disable the high FP registers for the thread that we switch-out if the CPU currently has its high FP registers. This avoids that the high FP registers remain enabled for the thread even when the CPU has unloaded them or the thread migrated to another processor. Likewise, when we switch-in a thread of that has its high FP registers on the CPU, we enable them. This avoids an otherwise harmless, but unnecessary trap to have them enabled. The code that handles the disabled high FP trap (in trap()) has been turned into a critical section for the most part to avoid being preempted. If there's a race, we bail out and have the processor trap again if necessary. Avoid using the generic ia64_highfp_save() function when the context is predictable. The function adds unnecessary overhead. Don't use ia64_highfp_load() for the same reason. The function is now unused and can be removed. These changes make the lazy context switching of the high FP registers in an UP kernel functional.
# 119970	10-Sep-2003	marcel	Rewrite the SAPIC initialization to always program the RTEs with what we think is the correct trigger mode and polarity. This allows us to implement BUS_CONFIG_INTR() as an update of the RTE in question. Consequently, we can trust the RTE when we enable an interrupt and avoids that we need to know about the trigger mode and polarity at that time.
# 119787	05-Sep-2003	marcel	Fix a place where I forgot to change the code that checks whether we return to kernel or userland. This triggered a panic in a KSE application when TDF_USTATCLOCK was set in the case userland was interrupted, but we never called ast() on our way out. As such, we called ast() at some other time. Unfortunately, TDF_USTATCLOCK handling assumes running in the interrupt thread. This was not the case anymore. To avoid making the same mistake later, interrupt() now returns to its caller whether we interrupted userland or not. This avoids that we have to duplicate the check in assembly, where it's bound to fall off the scope. Now we simply check the return value and call ast() if appropriate. Run into this: davidxu
# 118990	16-Aug-2003	marcel	Further cleanup <machine/cpu.h> and <machine/md_var.h>: move the MI prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to cpu.h. This affects db_command.c and kern_shutdown.c. ia64: move all MD prototypes from cpu.h to md_var.h. This affects madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory(). It's not used (vm_machdep.c). alpha: the MD prototypes have been left in cpu.h with a comment that they should be there. Moving them is left for later. It was expected that the impact would be significant enough to be done in a seperate commit. powerpc: MD prototypes left in cpu.h. Comment added. Suggested by: bde Tested with: make universe (pc98 incomplete)
# 118414	04-Aug-2003	marcel	Cleanup the clock code. This includes: o Remove alpha specific timer code (mc146818A) and compiled-out calibration of said timer. o Remove i386 inherited timer code (i8253) and related acquire and release functions. o Move sysbeep() from clock.c to machdep.c and have it return ENODEV. Console beeps should be implemented using ACPI or if no such device is described, using the sound driver. o Move the sysctls related to adjkerntz, disable_rtc_set and wall_cmos_clock from machdep.c to clock.c, where the variables are. o Don't hardcode a hz value of 1024 in cpu_initclocks() and don't bother faking a stathz that's 1/8 of that. Keep it simple: hz defaults to HZ and stathz equals hz. This is also how it's done for sparc64. o Keep a per-CPU ITC counter (pc_clock) and adjustment (pc_clockadj) to calculate ITC skew and corrections. On average, we adjust the ITC match register once every ~1500 interrupts for a duration of 2 consequtive interruprs. This is to correct the non-deterministic behaviour of the ITC interrupt (there's a delay between the match and the raising of the interrupt). o Add 4 debugging sysctls to monitor clock behaviour. Those are debug.clock_adjust_edges, debug.clock_adjust_excess, debug.clock_adjust_lost and debug.clock_adjust_ticks. The first counts the individual adjustment cycles (when the skew first crosses the threshold), the second counts the number of times the adjustment was excessive (any non-zero value is to be considered a bug), the third counts lost clock interrupts and the last counts the number of interrupts for which we applied an adjustment (debug.clock_adjust_ticks / debug.clock_adjust_edges gives the avarage duration of an individual adjustment -- should be ~2). While here, remove some nearby (trivial) left-overs from alpha and other cleanups.
# 115084	16-May-2003	marcel	Revamp of the syscall path, exception and context handling. The prime objectives are: o Implement a syscall path based on the epc inststruction (see sys/ia64/ia64/syscall.s). o Revisit the places were we need to save and restore registers and define those contexts in terms of the register sets (see sys/ia64/include/_regset.h). Secundairy objectives: o Remove the requirement to use contigmalloc for kernel stacks. o Better handling of the high FP registers for SMP systems. o Switch to the new cpu_switch() and cpu_throw() semantics. o Add a good unwinder to reconstruct contexts for the rare cases we need to (see sys/contrib/ia64/libuwx) Many files are affected by this change. Functionally it boils down to: o The EPC syscall doesn't preserve registers it does not need to preserve and places the arguments differently on the stack. This affects libc and truss. o The address of the kernel page directory (kptdir) had to be unstaticized for use by the nested TLB fault handler. The name has been changed to ia64_kptdir to avoid conflicts. The renaming affects libkvm. o The trapframe only contains the special registers and the scratch registers. For syscalls using the EPC syscall path no scratch registers are saved. This affects all places where the trapframe is accessed. Most notably the unaligned access handler, the signal delivery code and the debugger. o Context switching only partly saves the special registers and the preserved registers. This affects cpu_switch() and triggered the move to the new semantics, which additionally affects cpu_throw(). o The high FP registers are either in the PCB or on some CPU. context switching for them is done lazily. This affects trap(). o The mcontext has room for all registers, but not all of them have to be defined in all cases. This mostly affects signal delivery code now. The *context syscalls are as of yet still unimplemented. Many details went into the removal of the requirement to use contigmalloc for kernel stacks. The details are mostly CPU specific and limited to exception_save() and exception_restore(). The few places where we create, destroy or switch stacks were mostly simplified by not having to construct physical addresses and additionally saving the virtual addresses for later use. Besides more efficient context saving and restoring, which of course yields a noticable speedup, this also fixes the dreaded SMP bootup problem as a side-effect. The details of which are still not fully understood. This change includes all the necessary backward compatibility code to have it handle older userland binaries that use the break instruction for syscalls. Support for break-based syscalls has been pessimized in favor of a clean implementation. Due to the overall better performance of the kernel, this will still be notived as an improvement if it's noticed at all. Approved by: re@ (jhb)
# 110296	03-Feb-2003	jake	Split statclock into statclock and profclock, and made the method for driving statclock based on profhz when profiling is enabled MD, since most platforms don't use this anyway. This removes the need for statclock_process, whose only purpose was to subdivide profhz, and gets the profiling clock running outside of sched_lock on platforms that implement suswintr. Also changed the interface for starting and stopping the profiling clock to do just that, instead of changing the rate of statclock, since they can now be separate. Reviewed by: jhb, tmm Tested on: i386, sparc64
# 110190	01-Feb-2003	julian	Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.
# 109911	26-Jan-2003	julian	Unbreak SMP cases for these architectures. statclock_process() changed arguments. note: it may be worth checking if curkse is needed on these architectures.. (and if so, why?)
# 108759	06-Jan-2003	marcel	Move ia64_sapics and ia64_sapic_count from interrupt.c to sapic.c and declare them extern in interrupt.c. This eliminates the need for ia64_add_sapic(), which is called from sapic.c. While here, reformat ia64_enable() in interrupt.c to improve indentation and add a sysctl (machdep.apic) to dump the I/O APIC entries currently programmed into all I/O APICs. The latter can help analyze interrupt problems. Note that the sysctl is not intended as a userland (software) interface. It may be changed in the future to include counters so that vmstat -i can make use of it. It may also be removed...
# 108757	05-Jan-2003	peter	Move the itm reload to a single place rather than having two identical copies of the reload. Note that we use the precomputed itm_reload value so that we can avoid a division in the kernel. The ia64 cpu does not have integer divide, so this would have been done by a floating point operation.
# 108756	05-Jan-2003	marcel	Replace the hardcoding of 255 as the clock interrupt vector with CLOCK_VECTOR and define it as 254, not 255. Vector 255 is already in use as the AP wakeup vector on the HP rx2600. This needs to be made more dynamic. The likelyhood of vector 254 being in use is pretty small, but we already have code to assign vectors to IPIs (see sal.c) and it's preobably better to have a centralized "vector manager" that hands out vectors based on some imput (like priority).
# 108751	05-Jan-2003	marcel	Manually inline handleclock(). There's only a single caller and handleclock itself is trivial. While here, replace (itc_frequency+hz/2)/hz with itm_reload for consistency. There's now a single place where we determine the ITM reload value.
# 108749	05-Jan-2003	marcel	Count interrupts as soon as possible. This makes sure interrupts are counted even when there are no handlers.
# 106066	27-Oct-2002	marcel	Make vmstat -i work: o Properly set the pointer to the counter for each interrupt and update the intrnames table. o Remove Alpha cruft from intrcnt.h. o Create INTRNAME_LEN as the single entity that defines the width of the names in the intrnames table (incl. terminatinf '\0').
# 104433	03-Oct-2002	peter	Do a bit of rude hackery to get clock interrupts on all CPUs. This is partly based on the Alpha system which duplicates the clock to each cpu, instead of doing a clock roundrobin like on i386. This means we get hz * ncpu clocks per second and so we have to seperate clock sampling from actual 'do the work' clock processing. The BSP runs the complete processing, the rest just sample state etc. Using the on-cpu interval timer is not ideal as it will drift. There is more to be done here, we should use an external clock source.
# 103436	16-Sep-2002	peter	Initiate deorbit burn for the i386-only a.out related support. Moves are under way to move the remnants of the a.out toolchain to ports. As the comment in src/Makefile said, this stuff is deprecated and one should not expect this to remain beyond 4.0-REL. It has already lasted WAY beyond that. Notable exceptions: gcc - I have not touched the a.out generation stuff there. ldd/ldconfig - still have some code to interface with a.out rtld. old as/ld/etc - I have not removed these yet, pending their move to ports. some includes - necessary for ldd/ldconfig for now. Tested on: i386 (extensively), alpha
# 96912	19-May-2002	marcel	o Remove namespace pollution from param.h: - Don't include ia64_cpu.h and cpu.h - Guard definitions by _NO_NAMESPACE_POLLUTION - Move definition of KERNBASE to vmparam.h o Move definitions of IA64_RR_{BASE\|MASK} to vmparam.h o Move definitions of IA64_PHYS_TO_RR{6\|7} to vmparam.h o While here, remove some left-over Alpha references.
# 96442	12-May-2002	marcel	o Rename ia64_count_aps to ia64_count_cpus and reimplement the function to return the total number of CPUs and not the highest CPU id. o Define mp_maxid based on the minimum of the actual number of CPUs in the system and MAXCPU. o In cpu_mp_add, when the CPU id of the CPU we're trying to add is larger than mp_maxid, don't add the CPU. Formerly this was based on MAXCPU. Don't count CPUs when we add them. We already know how many CPUs exist. o Replace MAXCPU with mp_maxid when used in loops that iterate over the id space. This avoids a couple of useless iterations. o In cpu_mp_unleash, use the number of CPUs to determine if we need to launch the CPUs. o Remove mp_hardware as it's not used anymore. o Move the IPI vector array from mp_machdep.c to sal.c. We use the array as a centralized place to collect vector assignments. Note that we still assign vectors to SMP specific IPIs in non-SMP configurations. Rename the array from mp_ipi_vector to ipi_vector. o Add IPI_MCA_RENDEZ and IPI_MCA_CMCV. These are used by MCA. Note that IPI_MCA_CMCV is not SMP specific. o Initialize the ipi_vector array so that we place the IPIs in sensible priority classes. The classes are relative to where the AP wake-up vector is located to guarantee that it's the highest priority (external) interrupt. Class assignment is as follows: class IPI notes x AP wake-up (normally x=15) x-1 MCA rendezvous x-2 AST, Rendezvous, stop x-3 CMCV, test
# 93818	04-Apr-2002	jhb	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
# 92268	14-Mar-2002	dfr	* Add some KTR messages for IPIs. * Don't call ast() from interrupt() - if we switch, then we will miss writing cr.eoi which will prevent the current cpu from receiving interrupts until the current thread is resumed. The call to ast() happens magically in exception_restore where it is safe. * Add DDB 'show irq' command to examine interrupt hardware state.
# 92105	11-Mar-2002	jhb	Fix a misspelling of mine: s/optomization/optimization/. Noticed by: bmilekic
# 91669	05-Mar-2002	marcel	Call ast() only when we're handling a user trap.
# 88903	05-Jan-2002	peter	Convert a bunch of 1 << PCPU_GET(cpuid) to PCPU_GET(cpumask).
# 88900	05-Jan-2002	jhb	Change the preemption code for software interrupt thread schedules and mutex releases to not require flags for the cases when preemption is not allowed: The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.) I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly. Reviewed by: peter Tested on: i386, alpha
# 88687	30-Dec-2001	marcel	Draft implementation of IPI handling.
# 85684	29-Oct-2001	dfr	Make the various bits of SMP code conditional on SMP so that I can still build non-SMP kernels.
# 85674	29-Oct-2001	marcel	o Send a test IPI from the BSP to itself at the same time APs are woken up. o Make IPIs synchronuous by default. If we want asynchronuous IPIs, we may want to make the memory fence controllable.
# 85670	29-Oct-2001	marcel	Make the clock vector 255 instead of 240. On Lion boxes, 240 is the AP wake-up vector. We probably want a more dynamic approach to assigning vectors in the future...
# 84541	05-Oct-2001	dfr	Wire up most of the interrupt handling infrastructure. Not sure it works right yet but its enough for the ATA probe to work. The SCSI probes which follow are broken though.
# 83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 78887	27-Jun-2001	jhb	Make this compile again. Broken since June 1.
# 76410	09-May-2001	jhb	Add needed sys/lock.h include.
# 75700	19-Apr-2001	dfr	Don't take the Giant mutex for clock interrupts.
# 74732	24-Mar-2001	jhb	Stick a prototype for handleclock() in machine/clock.h and include it interrupt.c to quiet a warning.
# 72200	09-Feb-2001	bmilekic	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# 71337	21-Jan-2001	jake	Make intr_nesting_level per-process, rather than per-cpu. Setup interrupt threads to run with it always >= 1, so that malloc can detect M_WAITOK from "interrupt" context. This is also necessary in order to context switch from sched_ithd() directly. Reviewed By: peter
# 70861	10-Jan-2001	jake	Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables other then curproc.
# 67522	24-Oct-2000	dfr	* Various fixes to breakage introduced by the atomic and mutex reorgs. * Fixes to the signal delivery code. Not quite right yet. I would have preferred to wait until I have signal delivery actually working but the current kernel in CVS doesn't build.
# 67195	16-Oct-2000	dfr	Remember to re-initialise cr.itm on clock interrupts so that we get more than just one tick.
# 67032	12-Oct-2000	dfr	Implement a rudimentary interrupt handling system which should be good enough for clock interrupts in SKI.
# 66458	29-Sep-2000	dfr	This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will not work on any real hardware (or fully work on any simulator). Much more needs to happen before this is actually functional but its nice to see the FreeBSD copyright message appear in the ia64 simulator.