Cross Reference: /freebsd-11.0-release/sys/kern/kern

History log of /freebsd-11.0-release/sys/kern/kern_switch.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 303975	11-Aug-2016	gjb	Copy stable/11@r303970 to releng/11.0 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, and rename it to RC1. Update __FreeBSD_version. Use the quarterly branch for the default FreeBSD.conf pkg(8) repo and the dvd1.iso packages population. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-11.0-release/etc/pkg/FreeBSD.conf /freebsd-11.0-release/release/pkg_repos/release-dvd.conf /freebsd-11.0-release/sys/conf/newvers.sh /freebsd-11.0-release/sys/sys/param.h
# 302408	08-Jul-2016	gjb	Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here. Additional commits post-branch will follow. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
# 277528	22-Jan-2015	hselasky	Revert for r277213: FreeBSD developers need more time to review patches in the surrounding areas like the TCP stack which are using MPSAFE callouts to restore distribution of callouts on multiple CPUs. Bump the __FreeBSD_version instead of reverting it. Suggested by: kmacy, adrian, glebius and kib Differential Revision: https://reviews.freebsd.org/D1438
# 277213	15-Jan-2015	hselasky	Major callout subsystem cleanup and rewrite: - Close a migration race where callout_reset() failed to set the CALLOUT_ACTIVE flag. - Callout callback functions are now allowed to be protected by spinlocks. - Switching the callout CPU number cannot always be done on a per-callout basis. See the updated timeout(9) manual page for more information. - The timeout(9) manual page has been updated to reflect how all the functions inside the callout API are working. The manual page has been made function oriented to make it easier to deduce how each of the functions making up the callout API are working without having to first read the whole manual page. Group all functions into a handful of sections which should give a quick top-level overview when the different functions should be used. - The CALLOUT_SHAREDLOCK flag and its functionality has been removed to reduce the complexity in the callout code and to avoid problems about atomically stopping callouts via callout_stop(). If someone needs it, it can be re-added. From my quick grep there are no CALLOUT_SHAREDLOCK clients in the kernel. - A new callout API function named "callout_drain_async()" has been added. See the updated timeout(9) manual page for a complete description. - Update the callout clients in the "kern/" folder to use the callout API properly, like cv_timedwait(). Previously there was some custom sleepqueue code in the callout subsystem, which has been removed, because we now allow callouts to be protected by spinlocks. This allows us to tear down the callout like done with regular mutexes, and a "td_slpmutex" has been added to "struct thread" to atomically teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and "SWT_SLEEPQTIMO" states can now be completely removed. Currently they are marked as available and will be cleaned up in a follow up commit. - Bump the __FreeBSD_version to indicate kernel modules need recompilation. - There has been several reports that this patch "seems to squash a serious bug leading to a callout timeout and panic". Kernel build testing: all architectures were built MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D1438 Sponsored by: Mellanox Technologies Reviewed by: jhb, adrian, sbruno and emaste
# 244046	09-Dec-2012	attilio	Add a comment on why inlining critical_enter() may not be a good idea for the general case. Reviewed by: bde MFC after: 1 week
# 228265	04-Dec-2011	avg	critical_exit: ignore td_owepreempt if kdb_active is set calling mi_switch in such a context results in a recursion via kdb_switch Suggested by: jhb Reviewed by: jhb MFC after: 5 weeks
# 209059	11-Jun-2010	jhb	Update several places that iterate over CPUs to use CPU_FOREACH().
# 194936	25-Jun-2009	jeff	- Use DPCPU for SCHED_STATS. This is somewhat awkward because the offset of the stat is not known until link time so we must emit a function to call SYSCTL_ADD_PROC rather than using SYSCTL_PROC directly. - Eliminate the atomic from SCHED_STAT_INC now that it's using per-cpu variables. Sched stats are always incremented while we're holding a spinlock so no further protection is required. Reviewed by: sam
# 178961	12-May-2008	julian	fix typo in runz_fuzz noticed by:Elijah Buck
# 178272	17-Apr-2008	jeff	- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia
# 177435	20-Mar-2008	jeff	- Restore runq to manipulating threads directly by putting runq links and rqindex back in struct thread. - Compile kern_switch.c independently again and stop #include'ing it from schedulers. - Remove the ts_thread backpointers and convert most code to go from struct thread to struct td_sched. - Cleanup the ts_flags #define garbage that was causing us to sometimes do things that expanded to td->td_sched->ts_thread->td_flags in 4BSD. - Export the kern.sched sysctl node in sysctl.h
# 177428	20-Mar-2008	jeff	- Remove the unused and redundant sched_newproc() function. - Remove the unused and redundant sched_newthread() which peaks into scheduler private structures.
# 177419	20-Mar-2008	jeff	- Move maybe_preempt() from kern_switch.c to sched_4bsd.c. This is function is only used by 4bsd. - Create a new runq_choose_fuzz() function rather than polluting runq_choose() with 4BSD specific code. - Move the fuzz sysctl into sched_4bsd.c - Remove some dead code from kern_switch.c
# 177253	16-Mar-2008	rwatson	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
# 177091	12-Mar-2008	jeff	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
# 173600	14-Nov-2007	julian	generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
# 172481	08-Oct-2007	jeff	- Fix ULE in kernels without PREEMPTION compiled in by always enabling the critical_exit() owepreempt check. ULE will always use owepreempt to preempt the idle thread. This change does not effect 4BSD since it will never set owepreempt without PREEMPTION enabled. - Remove some unused code from choosethread(). Discussed with: jhb Approved by: re
# 172256	20-Sep-2007	attilio	Fix some entries in the locks static table of witness. In particular: - smp_tlb_mtx is no longer used, so it is axed. - smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in the table, however, has been the source of a false positive LOR reporting with the dt_lock. However, smp rendezvous lock would have had sched_lock there for older lock, so it wasn't still a leaf lock. - allpmaps is only used in ia32 architecture, so it is inserted in the appropriate stub. Addictionally: - kse_zombie_lock is no longer present, so its definition is axed out. - zombie_lock doesn't need to have an exported symbol, so just let's it be declared as static. Tested by: kris Approved by: jeff (mentor) Approved by: re
# 172207	17-Sep-2007	jeff	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
# 171900	20-Aug-2007	jeff	- Improve runq_findbit_from() which is used by ULE's circular queue. Mask of the bits we want to ignore on the first pass rather than doing a linear scan. This puts us within a few instructions of the cost of runq_findbit() and removes this function from the top of profiling output for context switch heavy workloads. Approved by: re
# 171712	03-Aug-2007	jeff	- Set SW_PREEMPT when we preempt in critical_exit(). Approved by: re
# 171495	19-Jul-2007	jeff	- Remove explicit references to sched_lock. A simpler assert will do. Approved by: re
# 170631	12-Jun-2007	jeff	- Garbage collect unused concurrency functions.
# 170293	04-Jun-2007	jeff	Commit 1/14 of sched_lock decomposition. - Move all scheduler locking into the schedulers utilizing a technique similar to solaris's container locking. - A per-process spinlock is now used to protect the queue of threads, thread count, suspension count, p_sflags, and other process related scheduling fields. - The new thread lock is actually a pointer to a spinlock for the container that the thread is currently owned by. The container may be a turnstile, sleepqueue, or run queue. - thread_lock() is now used to protect access to thread related scheduling fields. thread_unlock() unlocks the lock and thread_set_lock() implements the transition from one lock to another. - A new "blocked_lock" is used in cases where it is not safe to hold the actual thread's lock yet we must prevent access to the thread. - sched_throw() and sched_fork_exit() are introduced to allow the schedulers to fix-up locking at these points. - Add some minor infrastructure for optionally exporting scheduler statistics that were invaluable in solving performance problems with this patch. Generally these statistics allow you to differentiate between different causes of context switches. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
# 166557	08-Feb-2007	jeff	- Change types for necent runq additions to u_char rather than int. - Fix these types in ULE as well. This fixes bugs in priority index calculations in certain edge cases. (int)-1 % 64 != (uint)-1 % 64. Reported by: kkenn using pho's stress2.
# 166188	23-Jan-2007	jeff	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.
# 165765	04-Jan-2007	jeff	- Don't pass a pointer into runq_choose_from(). The caller can adjust the index if it chooses to.
# 165761	04-Jan-2007	jeff	- Add three new functions to support circular run queues. - runq_add_pri allows the caller to position the thread at any rqindex regardless of priority. - runq_choose_from() chooses the lowest priority thread starting from a given index. The index is updated with the rqindex of the chosen thread. This routine is used to pick the lowest priority relative to a given index. - runq_remove_idx() updates the index if the run queue that held the removed thread is now empty.
# 165693	31-Dec-2006	rwatson	Prefer a more traditional spelling of inhibited in comments and panic messages.
# 164936	06-Dec-2006	julian	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.
# 163709	26-Oct-2006	jb	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
# 159570	13-Jun-2006	davidxu	Add scheduler CORE, the work I have done half a year ago, recent, I picked it up again. The scheduler is forked from ULE, but the algorithm to detect an interactive process is almost completely different with ULE, it comes from Linux paper "Understanding the Linux 2.6.8.1 CPU Scheduler", although I still use same word "score" as a priority boost in ULE scheduler. Briefly, the scheduler has following characteristic: 1. Timesharing process's nice value is seriously respected, timeslice and interaction detecting algorithm are based on nice value. 2. per-cpu scheduling queue and load balancing. 3. O(1) scheduling. 4. Some cpu affinity code in wakeup path. 5. Support POSIX SCHED_FIFO and SCHED_RR. Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler uses 256 priority queues. Unlike ULE which using pull and push, the scheduelr uses pull method, the main reason is to let relative idle cpu do the work, but current the whole scheduler is protected by the big sched_lock, so the benefit is not visible, it really can be worse than nothing because all other cpu are locked out when we are doing balancing work, which the 4BSD scheduelr does not have this problem. The scheduler does not support hyperthreading very well, in fact, the scheduler does not make the difference between physical CPU and logical CPU, this should be improved in feature. The scheduler has priority inversion problem on MP machine, it is not good for realtime scheduling, it can cause realtime process starving. As a result, it seems the MySQL super-smack runs better on my Pentium-D machine when using libthr, despite on UP or SMP kernel.
# 159154	01-Jun-2006	cognet	sched_rem() already sets ke->ke_state to KES_THREAD, so there's no need to redo it.
# 153797	28-Dec-2005	kan	Trim trailing whitespace.
# 153510	18-Dec-2005	njl	Restore KTR_CRITICAL but conditionally compile it in as KTR_SCHED. Requested by: scottl, jhb
# 153493	17-Dec-2005	njl	Clean up unused or poorly utilized KTR values. Remove KTR_FS, KTR_KGDB, and KTR_IO as they were never used. Remove KTR_CLK since it was only used for hardclock firing and use KTR_INTR there instead. Remove KTR_CRITICAL since it was only used for crit enter/exit and use KTR_CONTENTION instead.
# 148661	03-Aug-2005	davidxu	In adjustrunqueue(), add code to handle thread migrating case for ULE scheduler. In original code, local run queue of threaded ksegrp is corrupted if adjustrunqueue() is called while thread is migrating.
# 147216	10-Jun-2005	ups	Restore preemption of idle threads. Submitted by: jhb
# 147190	09-Jun-2005	ups	Lots of whitespace cleanup. Fix for broken if condition. Submitted by: nate@
# 147182	09-Jun-2005	ups	Fix some race conditions for pinned threads that may cause them to run on the wrong CPU. Add IPI support for preempting a thread on another CPU. MFC after:3 weeks
# 146554	23-May-2005	ups	Use low level constructs borrowed from interrupt threads to wait for work in proc0. Remove the TDP_WAKEPROC0 workaround.
# 146362	19-May-2005	ups	Fix a bug that caused preemption to happen for a thread in the same ksegrp with the same priority as the currently running thread. This can cause propagate_priority() to panic. Pointy hat to: ups
# 144777	08-Apr-2005	ups	Sprinkle some volatile magic and rearrange things a bit to avoid race conditions in critical_exit now that it no longer blocks interrupts. Reviewed by: jhb
# 144637	04-Apr-2005	jhb	Divorce critical sections from spinlocks. Critical sections as denoted by critical_enter() and critical_exit() are now solely a mechanism for deferring kernel preemptions. They no longer have any affect on interrupts. This means that standalone critical sections are now very cheap as they are simply unlocked integer increments and decrements for the common case. Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter() and spinlock_exit(). This KPI is responsible for providing whatever MD guarantees are needed to ensure that a thread holding a spin lock won't be preempted by any other code that will try to lock the same lock. For now all archs continue to block interrupts in a "spinlock section" as they did formerly in all critical sections. Note that I've also taken this opportunity to push a few things into MD code rather than MI. For example, critical_fork_exit() no longer exists. Instead, MD code ensures that new threads have the correct state when they are created. Also, we no longer try to fixup the idlethreads for APs in MI code. Instead, each arch sets the initial curthread and adjusts the state of the idle thread it borrows in order to perform the initial context switch. This change is largely a big NOP, but the cleaner separation it provides will allow for more efficient alternative locking schemes in other parts of the kernel (bare critical sections rather than per-CPU spin mutexes for per-CPU data for example). Reviewed by: grehan, cognet, arch@, others Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
# 143884	20-Mar-2005	rwatson	Add a read-only kern.sched.preemption sysctl so that user space can tell if "options PREEMPTION" is compiled into the kernel.
# 143757	17-Mar-2005	rwatson	A further step on the journey of meaking panics and debugging more reliable: in the window between the beginning of panic() and entering the debugger, it's possible to receive interrupts. If we receive an interrupt, don't preempt if panicstr != NULL, as the system is in the process of failing, and the preempting thread is likely to stumble over the failure. The typical scenario is during the printf() in panic() prior to entering the debugger, but when running with a slower console type such as serial console. It could be that the panic string should be passed to the debugger to print, so that it can run from the debugger's environment rather than a regular kernel printf. Glanced at by: jhb
# 139804	06-Jan-2005	imp	/* -> /*- for copyright notices, minor format tweaks as necessary
# 139315	26-Dec-2004	jeff	- Define KTR points for KTR_SCHED.
# 138843	14-Dec-2004	jeff	- Garbage collect several unused members of struct kse and struce ksegrp. As best as I can tell, some of these were never used.
# 137946	20-Nov-2004	das	Remove local definitions of RANGEOF() and use __rangeof() instead. Also remove a few bogus casts.
# 137364	07-Nov-2004	rwatson	Add basic critical section tracing to KTR using event type KTR_CRITICAL. This generates a KTR event for each critical section entered and exited. It would be desirable to also log the filename and line number of the source entering or exiting the critical section, but this requires hacking up the critical section API, so I've not done that yet.
# 136583	16-Oct-2004	scottl	If a process needs to be swapped in, wakeup the swapper from within critical_exit as the process is getting scheduled to run. This is subotimal but for now avoid the LOR between the scheduler and the sleepq systems. This is a 5.3 candidate. Submitted by: davidxu MFC After: 3 days
# 136494	13-Oct-2004	ups	Fix maybe_preempt_in_ksegrp for !SMP. Tested by: tegge Reviewed by: julian Approved by: sam (mentor) MFC after: 3 days
# 136452	12-Oct-2004	phk	Make !SMP kernels compile, and as far as I can tell, work again.
# 136438	12-Oct-2004	ups	Prevent preemption in slot_fill. Implement preemption between threads in the same ksegp in out of slot situations to prevent priority inversion. Tested by: pho Reviewed by: jhb, julian Approved by: sam (mentor) MFC: ASAP
# 136345	10-Oct-2004	julian	Don't release the slot twice.. sched_rem() has already done it. Submitted by: stephan uphoff (ups at tree dot com) MFC after: 3 days
# 136170	05-Oct-2004	julian	When preempting a thread, put it back on the HEAD of its run queue. (Only really implemented in 4bsd) MFC after: 4 days
# 136167	05-Oct-2004	julian	Use some macros to trach available scheduler slots to allow easier debugging. MFC after: 4 days
# 135470	19-Sep-2004	das	The zone from which proc structures are allocated is marked UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should never be called. Move an assertion from proc_fini() to proc_dtor() and garbage-collect the rest of the unreachable code. I have retained vm_proc_dispose(), since I consider its disuse a bug.
# 135295	16-Sep-2004	julian	clean up thread runq accounting a bit. MFC after: 3 days
# 135291	16-Sep-2004	julian	e specific code to revert a partial add ot teh run queue, not remrunqueue() which can't handle a partially added thread. MFC after: 1 week
# 135255	15-Sep-2004	julian	Oops accidentally removed #ifdef SCHED_4BSD as part of another commit This function is not yet used in ULE
# 135182	13-Sep-2004	julian	Commit a fix for some panics we've been seeing with preemption. MFC after: 2 days
# 135181	13-Sep-2004	julian	Add some kasserts
# 135051	10-Sep-2004	julian	Add some code to allow threads to nominat a sibling to run if theyu are going to sleep. MFC after: 1 week
# 134888	07-Sep-2004	julian	Make debug printf less threatenning and make it only print out once. MFC after: 2 days
# 134837	06-Sep-2004	julian	Don't do IPIs on behalf of interrupt threads. just punt straight on through to teh preemption code. Make a KASSSERT out of a condition that can no longer occur. MFC after: 1 week
# 134791	05-Sep-2004	julian	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
# 134665	02-Sep-2004	julian	remove unused code MFC after: 2 days
# 134649	02-Sep-2004	scottl	Turn PREEMPTION into a kernel option. Make sure that it's defined if FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is enabled (code inspired by the PREEMPTION warning in kern_switch.c). This is a possible MT5 candidate.
# 134591	01-Sep-2004	julian	Give the 4bsd scheduler the ability to wake up idle processors when there is new work to be done. MFC after: 5 days
# 134586	01-Sep-2004	julian	Give setrunqueue() and sched_add() more of a clue as to where they are coming from and what is expected from them. MFC after: 2 days
# 134417	28-Aug-2004	peter	Backout the previous backout (with scott's ok). sched_ule.c:1.122 is believed to fix the problem with ULE that this change triggered.
# 134070	20-Aug-2004	scottl	Revert the previous change. It works great for 4BSD but causes major problems for ULE. The reason is quite unknown and worrisome.
# 134067	20-Aug-2004	scottl	In maybe_preempt(), ignore threads that are in an inconsistent state. This is an effective band-aid for at least some of the scheduler corruption seen recently. The real fix will involve protecting threads while they are inconsistent, and will come later. Submitted by: julian
# 133414	10-Aug-2004	scottl	Add a temporary debugging hack to detect a deadlock in setrunqueue(). This is here so that we can gather stats on the nature of the recent rash of hard lockups, and in this particular case panic the machine instead of letting it deadlock forever.
# 133404	09-Aug-2004	julian	Make kg->kg_runnable actually count runnable threads in the ksegrp run queue instead of only doing it sometimes.. This is not used outdide of debugging code in the current code, but that will probably change.
# 133396	09-Aug-2004	julian	Increase the amount of data exported by KTR in the KTR_RUNQ setting. This extra data is needed to really follow what is going on in the threaded case.
# 133219	06-Aug-2004	jhb	Don't scare users with a warning about preemption being off when it isn't yet safe to have on by default.
# 132700	27-Jul-2004	rwatson	Pass a thread argument into cpu_critical_{enter,exit}() rather than dereference curthread. It is called only from critical_{enter,exit}(), which already dereferences curthread. This doesn't seem to affect SMP performance in my benchmarks, but improves MySQL transaction throughput by about 1% on UP on my Xeon. Head nodding: jhb, bmilekic
# 132586	23-Jul-2004	scottl	Remove the previous hack since it doesn't make a difference and is getting in the way of debugging.
# 132543	22-Jul-2004	scottl	Disable the PREEMPTION-enabled code in critical_exit() that encourages switching to a different thread. This is just a hack to try to improve stability some more, but likely points closer to the real culprit.
# 132266	16-Jul-2004	jhb	- Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflags since they are only accessed by curthread and thus do not need any locking. - Move pr_addr and pr_ticks out of struct uprof (which is per-process) and directly into struct thread as td_profil_addr and td_profil_ticks as these variables are really per-thread. (They are used to defer an addupc_intr() that was too "hard" until ast()).
# 131927	10-Jul-2004	marcel	Update for the KDB framework: o Make debugging code conditional upon KDB instead of DDB. o Call kdb_enter() instead of Debugger(). o Call kdb_backtrace() instead of db_print_backtrace() or backtrace(). kern_mutex.c: o Replace checks for db_active with checks for kdb_active and make them unconditional. kern_shutdown.c: o s/DDB_UNATTENDED/KDB_UNATTENDED/g o s/DDB_TRACE/KDB_TRACE/g o Save the TID of the thread doing the kernel dump so the debugger knows which thread to select as the current when debugging the kernel core file. o Clear kdb_active instead of db_active and do so unconditionally. o Remove backtrace() implementation. kern_synch.c: o Call kdb_reenter() instead of db_error().
# 131508	03-Jul-2004	marcel	Unbreak build for the the !PREEMPTION case: don't define variables that aren't used in that case.
# 131481	02-Jul-2004	jhb	Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)
# 125315	02-Feb-2004	jeff	- style fixes to the critical_exit() KASSERT(). Submitted by: bde
# 125286	01-Feb-2004	rwatson	Move KASSERT regarding td_critnest to after the value of td is set to curthread, to avoid warning and incorrect behavior. Hoped not to mind: jeff
# 125285	01-Feb-2004	jeff	- Assert that td_critnest > 0 in critical_exit() to catch cases of unbalanced uses of the critical_* api.
# 123499	12-Dec-2003	rwatson	Although sometimes to the uninitiated, it may seem like goup, KSEGOUP is actually spelt KSEGROUP. Go figure. Reported by: samy@kerneled.com
# 122849	17-Nov-2003	peter	Initial landing of SMP support for FreeBSD/amd64. - This is heavily derived from John Baldwin's apic/pci cleanup on i386. - I have completely rewritten or drastically cleaned up some other parts. (in particular, bootstrap) - This is still a WIP. It seems that there are some highly bogus bioses on nVidia nForce3-150 boards. I can't stress how broken these boards are. I have a workaround in mind, but right now the Asus SK8N is broken. The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed. - Most of my testing has been with SCHED_ULE. SCHED_4BSD works. - the apic and acpi components are 'standard'. - If you have an nVidia nForce3-150 board, you are stuck with 'device atpic' in addition, because they somehow managed to forget to connect the 8254 timer to the apic, even though its in the same silicon! ARGH! This directly violates the ACPI spec.
# 121171	17-Oct-2003	jeff	- Remove the correct thread from the run queue in setrunqueue(). This fixes ULE + KSE.
# 121127	16-Oct-2003	jeff	- Update the sched api. sched_{add,rem,clock,pctcpu} now all accept a td argument rather than a kse.
# 116361	15-Jun-2003	davidxu	Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.
# 116182	11-Jun-2003	obrien	Use __FBSDID().
# 115215	21-May-2003	julian	When we are spilling threads out of the run queue during panic, make sure we keep the thread state variable consistent with its real state. i.e. Don't say it's on the run queue when it isn't. Also clarify the associated comment. Turns a double panic back to a single panic :-/ Approved by: re@ (jhb)
# 112993	02-Apr-2003	peter	Commit a partial lazy thread switch mechanism for i386. it isn't as lazy as it could be and can do with some more cleanup. Currently its under options LAZY_SWITCH. What this does is avoid %cr3 reloads for short context switches that do not involve another user process. ie: we can take an interrupt, switch to a kthread and return to the user without explicitly flushing the tlb. However, this isn't as exciting as it could be, the interrupt overhead is still high and too much blocks on Giant still. There are some debug sysctls, for stats and for an on/off switch. The main problem with doing this has been "what if the process that you're running on exits while we're borrowing its address space?" - in this case we use an IPI to give it a kick when we're about to reclaim the pmap. Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a few more things and get some more feedback before turning it on by default. This is NOT a replacement for Bosko's lazy interrupt stuff. This was more meant for the kthread case, while his was for interrupts. Mine helps a little for interrupts, but his helps a lot more. The stats are enabled with options SWTCH_OPTIM_STATS - this has been a pseudo-option for years, I just added a bunch of stuff to it. One non-trivial change was to select a new thread before calling cpu_switch() in the first place. This allows us to catch the silly case of doing a cpu_switch() to the current process. This happens uncomfortably often. This simplifies a bit of the asm code in cpu_switch (no longer have to call choosethread() in the middle). This has been implemented on i386 and (thanks to jake) sparc64. The others will come soon. This is actually seperate to the lazy switch stuff. Glanced at by: jake, jhb
# 112397	19-Mar-2003	davidxu	Adjust code for userland preemptive. Userland can set a quantum in kse_mailbox to schedule an upcall, this is useful for userland timeout routine, for example pthread_cond_timedwait(). Also extract upcall scheduling code from kse_reassign and create a new function called thread_switchout to include these code. Reviewed by: julain
# 112021	09-Mar-2003	davidxu	Cosmetic change, make it QUEUE_MACRO_DEBUG friendly
# 111585	27-Feb-2003	julian	Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.
# 111128	19-Feb-2003	davidxu	Update comments to reflect new KSE code.
# 111041	17-Feb-2003	davidxu	Move code for detecting PS_NEEDSIGCHK into thread_schedule_upcall, I think it is a better place to handle it.
# 111032	17-Feb-2003	julian	Move a bunch of flags from the KSE to the thread. I was in two minds as to where to put them in the first case.. I should have listenned to the other mind. Submitted by: parts by davidxu@ Reviewed by: jeff@ mini@
# 111028	17-Feb-2003	jeff	- Split the struct kse into struct upcall and struct kse. struct kse will soon be visible only to schedulers. This greatly simplifies much the KSE code. Submitted by: davidxu
# 110190	01-Feb-2003	julian	Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.
# 109877	26-Jan-2003	davidxu	Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian
# 109550	20-Jan-2003	julian	Remove a KASSERT that can now happen and add a missing setrunnable.
# 108338	28-Dec-2002	julian	Add code to ddb to allow backtracing an arbitrary thread. (show thread {address}) Remove the IDLE kse state and replace it with a change in the way threads sahre KSEs. Every KSE now has a thread, which is considered its "owner" however a KSE may also be lent to other threads in the same group to allow completion of in-kernel work. n this case the owner remains the same and the KSE will revert to the owner when the other work has been completed. All creations of upcalls etc. is now done from kse_reassign() which in turn is called from mi_switch or thread_exit(). This means that special code can be removed from msleep() and cv_wait(). kse_release() does not leave a KSE with no thread any more but converts the existing thread into teh KSE's owner, and sets it up for doing an upcall. It is just inhibitted from being scheduled until there is some reason to do an upcall. Remove all trace of the kse_idle queue since it is no-longer needed. "Idle" KSEs are now on the loanable queue.
# 105129	14-Oct-2002	julian	Did you ever notice how stupid bugs show up much clearer when you see them in a commit message?
# 105127	14-Oct-2002	julian	Tidy up the scheduler's code for changing the priority of a thread. Logically pretty much a NOP.
# 104964	12-Oct-2002	jeff	- Create a new scheduler api that is defined in sys/sched.h - Begin moving scheduler specific functionality into sched_4bsd.c - Replace direct manipulation of scheduler data with hooks provided by the new api. - Remove KSE specific state modifications and single runq assumptions from kern_switch.c Reviewed by: -arch
# 104695	09-Oct-2002	julian	Round out the facilty for a 'bound' thread to loan out its KSE in specific situations. The owner thread must be blocked, and the borrower can not proceed back to user space with the borrowed KSE. The borrower will return the KSE on the next context switch where teh owner wants it back. This removes a lot of possible race conditions and deadlocks. It is consceivable that the borrower should inherit the priority of the owner too. that's another discussion and would be simple to do. Also, as part of this, the "preallocatd spare thread" is attached to the thread doing a syscall rather than the KSE. This removes the need to lock the scheduler when we want to access it, as it's now "at hand". DDB now shows a lot mor info for threaded proceses though it may need some optimisation to squeeze it all back into 80 chars again. (possible JKH project) Upcalls are now "bound" threads, but "KSE Lending" now means that other completing syscalls can be completed using that KSE before the upcall finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is one of the highest priorities in the KSE system.) The upcall when it happens will present all the completed syscalls to the KSE for selection.
# 104392	03-Oct-2002	davidxu	set ke_bound to NULL when kse owner thread becomes runnable. Reviewed by: julian (mentor)
# 104157	29-Sep-2002	julian	Implement basic KSE loaning. This stops a hread that is blocked in BOUND mode from stopping another thread from completing a syscall, and this allows it to release its resources etc. Probably more related commits to follow (at least one I know of) Initial concept by: julian, dillon Submitted by: davidxu
# 103832	23-Sep-2002	julian	Indentation does not define a block.. you need breces {} as well.. also add a mutex assert. (threaded path only) Submitted by: davidxu
# 103367	15-Sep-2002	julian	Allocate KSEs and KSEGRPs separatly and remove them from the proc structure. next step is to allow > 1 to be allocated per process. This would give multi-processor threads. (when the rest of the infrastructure is in place) While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc are diverging more than they should.. corrective action needed soon.
# 103216	11-Sep-2002	julian	Completely redo thread states. Reviewed by: davidxu@freebsd.org
# 102592	30-Aug-2002	julian	Rejig the code to figure out estcpu and work out how long a KSEGRP has been idle. What was there before was surprisingly ALMOST correct. Peter and I fried our brains on this for a couple of hours figuring out what this actually means in the context of multiple threads. Reviewed by: peter@freebsd.org
# 100913	30-Jul-2002	tanimura	- Optimize wakeup() and its friends; if a thread waken up is being swapped in, we do not have to ask for the scheduler thread to do that. - Assert that a process is not swapped out in runq functions and swapout(). - Introduce thread_safetoswapout() for readability. - In swapout_procs(), perform a test that may block (check of a thread working on its vm map) first. This lets us call swapout() with the sched_lock held, providing a better atomicity.
# 100209	17-Jul-2002	gallatin	Allow alphas to do crashdumps: Refuse to run anything in choosethread() after a panic which is not an interrupt thread, or the thread which caused the panic. Also, remove panicstr checks from msleep() and from cv_wait() in order to allow threads to go to sleep and yeild the cpu to the panicing thread, or to an interrupt thread which might be doing the crashdump. Reviewed by: jhb (and it was mostly his idea too)
# 99942	14-Jul-2002	julian	Thinking about it I came to the conclusion that the KSE states were incorrectly formulated. The correct states should be: IDLE: On the idle KSE list for that KSEG RUNQ: Linked onto the system run queue. THREAD: Attached to a thread and slaved to whatever state the thread is in. This means that most places where we were adjusting kse state can go away as it is just moving around because the thread is.. The only places we need to adjust the KSE state is in transition to and from the idle and run queues. Reviewed by: jhb@freebsd.org
# 99889	12-Jul-2002	julian	also set the KSE state for the idle KSE/thread case.
# 99887	12-Jul-2002	jhb	Set the thread state of the newly chosen to run thread to TDS_RUNNING in choosethread() in MI C code instead of doing it in in assembly in all the various cpu_switch() functions. This fixes problems on ia64 and sparc64. Reviewed by: julian, peter, benno Tested on: i386, alpha, sparc64
# 99834	11-Jul-2002	julian	Remove debugging code that I originally only wanted to be there for a couple of days after merge. Reminded with pointy stick by: jhb
# 99072	29-Jun-2002	julian	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
# 98469	20-Jun-2002	peter	Move the "- 1" into the RQB_FFS(mask) macro itself so that implementations can provide a base zero ffs function if they wish. This changes #define RQB_FFS(mask) (ffs64(mask)) foo = RQB_FFS(mask) - 1; to #define RQB_FFS(mask) (ffs64(mask) - 1) foo = RQB_FFS(mask); On some platforms we can get the "- 1" for free, eg: those that use the C code for ffs64(). Reviewed by: jake (in principle)
# 97261	25-May-2002	jake	Make the run queue parameters machine dependent. Optimize 64 bit architectures by using a 64 bit word for the bit array which keeps track of non-empty queues. Reviewed by: peter
# 96209	08-May-2002	jake	Remove runq_findproc. This never worked right in the first place and can be prohibitively expensive.
# 93607	01-Apr-2002	dillon	Stage-2 commit of the critical*() code. This re-inlines cpu_critical_enter() and cpu_critical_exit() and moves associated critical prototypes into their own header file, <arch>/<arch>/critical.h, which is only included by the three MI source files that need it. Backout and re-apply improperly comitted syntactical cleanups made to files that were still under active development. Backout improperly comitted program structure changes that moved localized declarations to the top of two procedures. Partially re-apply one of the program structure changes to move 'mask' into an intermediate block rather then in three separate sub-blocks to make the code more readable. Re-integrate bug fixes that Jake made to the sparc64 code. Note: In general, developers should not gratuitously move declarations out of sub-blocks. They are where they are for reasons of structure, grouping, readability, compiler-localizability, and to avoid developer-introduced bugs similar to several found in recent years in the VFS and VM code. Reviewed by: jake
# 93264	27-Mar-2002	dillon	Compromise for critical()/cpu_critical() recommit. Cleanup the interrupt disablement assumptions in kern_fork.c by adding another API call, cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it from MI to MD. Temporarily move cpu_critical() from <arch>/include/cpufunc.h to <arch>/<arch>/critical.c (stage-2 will clean this up). Implement interrupt deferral for i386 that allows interrupts to remain enabled inside critical sections. This also fixes an IPI interlock bug, and requires uses of icu_lock to be enclosed in a true interrupt disablement. This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized, and will move cpu_critical() into its own header file(s) + other things. This commit may break non-i386 architectures in trivial ways. This should be temporary. Reviewed by: core Approved by: core
# 91751	06-Mar-2002	des	Rename runq_find() to runq_findproc(), and hide it behind #ifdef DIAGNOSTIC, as it can have a severe impact on performance under high load, and the bug it was meant to catch was fixed ages ago.
# 91328	26-Feb-2002	dillon	revert last commit temporarily due to whining on the lists.
# 91315	26-Feb-2002	dillon	STAGE-1 of 3 commit - allow (but do not require) interrupts to remain enabled in critical sections and streamline critical_enter() and critical_exit(). This commit allows an architecture to leave interrupts enabled inside critical sections if it so wishes. Architectures that do not wish to do this are not effected by this change. This commit implements the feature for the I386 architecture and provides a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For now you can turn the sysctl on and off at any time in order to test the architectural changes or track down bugs. This commit is just the first stage. Some areas of the code, specifically the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will be cleaned up in the STAGE-2 commit when the critical_() functions are moved entirely into MD files. The following changes have been made: critical_enter() and critical_exit() for I386 now simply increment and decrement curthread->td_critnest. They no longer disable hard interrupts. When critical_exit() decrements the counter to 0 it effectively calls a routine to deal with whatever interrupts were deferred during the time the code was operating in a critical section. Other architectures are unaffected. * fork_exit() has been conditionalized to remove MD assumptions for the new code. Old code will still use the old MD assumptions in regards to hard interrupt disablement. In STAGE-2 this will be turned into a subroutine call into MD code rather then hardcoded in MI code. The new code places the burden of entering the critical section in the trampoline code where it belongs. * I386: interrupts are now enabled while we are in a critical section. The interrupt vector code has been adjusted to deal with the fact. If it detects that we are in a critical section it currently defers the interrupt by adding the appropriate bit to an interrupt mask. * In order to accomplish the deferral, icu_lock is required. This is i386-specific. Thus icu_lock can only be obtained by mainline i386 code while interrupts are hard disabled. This change has been made. * Because interrupts may or may not be hard disabled during a context switch, cpu_switch() can no longer simply assume that PSL_I will be in a consistent state. Therefore, it now saves and restores eflags. * FAST INTERRUPT PROVISION. Fast interrupts are currently deferred. The intention is to eventually allow them to operate either while we are in a critical section or, if we are able to restrict the use of sched_lock, while we are not holding the sched_lock. * ICU and APIC vector assembly for I386 cleaned up. The ICU code has been cleaned up to match the APIC code in regards to format and macro availability. Additionally, the code has been adjusted to deal with deferred interrupts. * Deferred interrupts use a per-cpu boolean int_pending, and masks ipending, spending, and fpending. Being per-cpu variables it is not currently necessary to lock; bus cycles modifying them. Note that the same mechanism will enable preemption to be incorporated as a true software interrupt without having to further hack up the critical nesting code. * Note: the old critical_enter() code in kern/kern_switch.c is currently #ifdef to be compatible with both the old and new methodology. In STAGE-2 it will be moved entirely to MD code. Performance issues: One of the purposes of this commit is to enhance critical section performance, specifically to greatly reduce bus overhead to allow the critical section code to be used to protect per-cpu caches. These caches, such as Jeff's slab allocator work, can potentially operate very quickly making the effective savings of the new critical section code's performance very significant. The second purpose of this commit is to allow architectures to enable certain interrupts while in a critical section. Specifically, the intention is to eventually allow certain FAST interrupts to operate rather then defer. The third purpose of this commit is to begin to clean up the critical_enter()/critical_exit()/cpu_critical_enter()/ cpu_critical_exit() API which currently has serious cross pollution in MI code (in fork_exit() and ast() for example). The fourth purpose of this commit is to provide a framework that allows kernel-preempting software interrupts to be implemented cleanly. This is currently used for two forward interrupts in I386. Other architectures will have the choice of using this infrastructure or building the functionality directly into critical_enter()/ critical_exit(). Finally, this commit is designed to greatly improve the flexibility of various architectures to manage critical section handling, software interrupts, preemption, and other highly integrated architecture-specific details.
# 90538	11-Feb-2002	julian	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)
# 88088	18-Dec-2001	jhb	Modify the critical section API as follows: - The MD functions critical_enter/exit are renamed to start with a cpu_ prefix. - MI wrapper functions critical_enter/exit maintain a per-thread nesting count and a per-thread critical section saved state set when entering a critical section while at nesting level 0 and restored when exiting to nesting level 0. This moves the saved state out of spin mutexes so that interlocking spin mutexes works properly. - Most low-level MD code that used critical_enter/exit now use cpu_critical_enter/exit. MI code such as device drivers and spin mutexes use the MI wrappers. Note that since the MI wrappers store the state in the current thread, they do not have any return values or arguments. - mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is assigned to curthread->td_savecrit during fork_exit(). Tested on: i386, alpha
# 83601	18-Sep-2001	jlemon	Change p into ke->ke_proc, this was hidden behind INVARIANTS.
# 83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 79246	04-Jul-2001	jhb	Spelling fix in a KASSERT: runq_chose -> runq_choose.
# 74914	28-Mar-2001	jhb	Catch up to header include changes: - <sys/mutex.h> now requires <sys/systm.h> - <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>
# 74282	15-Mar-2001	peter	Jake essentially rewrote this. It is not by any stretch of the imagination a derivative of what I did before.
# 74113	11-Mar-2001	des	Assert that the process we're trying to enqueue isn't already there.
# 74014	09-Mar-2001	jhb	Add a new informative KASSERT to ensure that a process is in the SRUN state before we return it to cpu_switch().
# 72977	24-Feb-2001	jake	- Assert that the proc to return is not NULL in runq_choose the same as runq_remove. - bzero the whole struct runq in runq_init just in case its not statically allocated.
# 72376	12-Feb-2001	jake	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.
# 70861	10-Jan-2001	jake	Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables other then curproc.
# 67365	20-Oct-2000	jhb	Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
# 65900	15-Sep-2000	jhb	Idle processes are always runnable, so let them state at SRUN.
# 65764	11-Sep-2000	jhb	Fix some printf format string warnings due to sizeof(int) != sizeof(long) on the alpha.
# 65557	07-Sep-2000	jasone	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
# 58717	28-Mar-2000	dillon	Commit major SMP cleanups and move the BGL (big giant lock) in the syscall path inward. A system call may select whether it needs the MP lock or not (the default being that it does need it). A great deal of conditional SMP code for various deadended experiments has been removed. 'cil' and 'cml' have been removed entirely, and the locking around the cpl has been removed. The conditional separately-locked fast-interrupt code has been removed, meaning that interrupts must hold the CPL now (but they pretty much had to anyway). Another reason for doing this is that the original separate-lock for interrupts just doesn't apply to the interrupt thread mechanism being contemplated. Modifications to the cpl may now ONLY occur while holding the MP lock. For example, if an otherwise MP safe syscall needs to mess with the cpl, it must hold the MP lock for the duration and must (as usual) save/restore the cpl in a nested fashion. This is precursor work for the real meat coming later: avoiding having to hold the MP lock for common syscalls and I/O's and interrupt threads. It is expected that the spl mechanisms and new interrupt threading mechanisms will be able to run in tandem, allowing a slow piecemeal transition to occur. This patch should result in a moderate performance improvement due to the considerable amount of code that has been removed from the critical path, especially the simplification of the spl*() calls. The real performance gains will come later. Approved by: jkh Reviewed by: current, bde (exception.s) Some work taken from: luoqi's patch
# 50056	19-Aug-1999	peter	Fix a typo and a bug. - One RTP_PRIO_REALTIME was meant to be RTP_PRIO_IDLE. - RTP_PRIO_FIFO was not handled. - Move the usual case first for setrunqueue() etc.
# 50027	19-Aug-1999	peter	Extract the next runnable process selection out of cpu_switch() into a fairly machine independent C routine. gcc actually does a pretty good job of this. Reviewed by: msmith (in principle)