History log of /freebsd-9.3-release/sys/kern/kern_condvar.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 250581 12-May-2013 hiren

MFC: r240475

Remove all the checks on curthread != NULL with the exception of some MD
trap checks (eg. printtrap()).

Generally this check is not needed anymore, as there is not a legitimate
case where curthread != NULL, after pcpu 0 area has been properly
initialized.

Reviewed by: attilio
Approved by: sbruno (mentor)


# 237719 28-Jun-2012 jhb

MFC 234494:
Include the associated wait channel message for context switch ktrace
records. kdump supports both the old and new messages.


# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 189071 26-Feb-2009 ed

Remove unused variables `p' and unneeded assignments of `rval'.

Found by: LLVM's scan-build


# 183352 25-Sep-2008 jhb

- Don't do a WITNESS_SAVE() on the interlock if it is Giant in the condition
variable wait routines. DROP_GIANT() already manages that state in the
Giant interlock case.
- Assert that Giant is held when it is passed as a sleep interlock.


# 181394 07-Aug-2008 jhb

Permit Giant to be passed as the explicit interlock either to
msleep/mtx_sleep or the various cv_*wait*() routines. Currently, the
"unlock" behavior of PDROP and cv_wait_unlock() with Giant is not
permitted as it is will be confusing since Giant is fully unrecursed and
unlocked during a thread sleep.

This is handy for subsystems which wish to allow unlocked drivers to
continue to use Giant such as CAM, the new TTY layer, and the new USB
stack. CAM currently uses a hack that I told Scott to use because I
really didn't want to permit this behavior, and the TTY and USB patches
both have various patches to permit this.

MFC after: 2 weeks


# 181334 05-Aug-2008 jhb

If a thread that is swapped out is made runnable, then the setrunnable()
routine wakes up proc0 so that proc0 can swap the thread back in.
Historically, this has been done by waking up proc0 directly from
setrunnable() itself via a wakeup(). When waking up a sleeping thread
that was swapped out (the usual case when waking proc0 since only sleeping
threads are eligible to be swapped out), this resulted in a bit of
recursion (e.g. wakeup() -> setrunnable() -> wakeup()).

With sleep queues having separate locks in 6.x and later, this caused a
spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock).
An attempt was made to fix this in 7.0 by making the proc0 wakeup use
the ithread mechanism for doing the wakeup. However, this required
grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep
elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated
into the same LOR since the thread lock would be some other sleepq lock.

Fix this by deferring the wakeup of the swapper until after the sleepq
lock held by the upper layer has been locked. The setrunnable() routine
now returns a boolean value to indicate whether or not proc0 needs to be
woken up. The end result is that consumers of the sleepq API such as
*sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup
proc0 if they get a non-zero return value from sleepq_abort(),
sleepq_broadcast(), or sleepq_signal().

Discussed with: jeff
Glanced at by: sam
Tested by: Jurgen Weber jurgen - ish com au
MFC after: 2 weeks


# 177085 12-Mar-2008 jeff

- Pass the priority argument from *sleep() into sleepq and down into
sched_sleep(). This removes extra thread_lock() acquisition and
allows the scheduler to decide what to do with the static boost.
- Change the priority arguments to cv_* to match sleepq/msleep/etc.
where 0 means no priority change. Catch -1 in cv_broadcastpri() and
convert it to 0 for now.
- Set a flag when sleeping in a way that is compatible with swapping
since direct priority comparisons are meaningless now.
- Add a sysctl to ule, kern.sched.static_boost, that defaults to on which
controls the boost behavior. Turning it off gives better performance
in some workloads but needs more investigation.
- While we're modifying sleepq, change signal and broadcast to both
return with the lock held as the lock was held on enter.

Reviewed by: jhb, peter


# 170294 04-Jun-2007 jeff

Commit 2/14 of sched_lock decomposition.
- Adapt sleepqueues to the new thread_lock() mechanism.
- Delay assigning the sleep queue spinlock as the thread lock until after
we've checked for signals. It is illegal for a thread to return in
mi_switch() with any lock assigned to td_lock other than the scheduler
locks.
- Change sleepq_catch_signals() to do the switch if necessary to simplify
the callers.
- Simplify timeout handling now that locking a sleeping thread has the
side-effect of locking the sleepqueue. Some previous races are no
longer possible.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 169392 08-May-2007 jhb

Fix a potential LOR with sx_sleep() and cv_wait() with sx locks by
1) adding the thread to the sleepq via sleepq_add() before dropping the
lock, and 2) dropping the sleepq lock around calls to lc_unlock() for
sleepable locks (i.e. locks that use sleepq's in their implementation).


# 167789 21-Mar-2007 jhb

Rename the cv_*wait*() functions to _cv_*wait*() and change their second
argument from a mutex to a lock_object. Add cv_*wait*() wrapper macros
that accept either a mutex, rwlock, or sx lock as the second argument and
convert it to a lock_object and then call _cv_*wait*(). Basically, the
visible difference is that you can now use rwlocks and sx locks with
condition variables using the same API as with mutexes.


# 167787 21-Mar-2007 jhb

Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,
rwlocks, and sx locks to 'lock_object'.


# 167786 21-Mar-2007 jhb

Don't use cv_wait_unlock() to implement cv_wait(). Instead, implement
cv_wait() fully and add missing KTRACE context switch traces.


# 165272 16-Dec-2006 kmacy

Add second sleep queue so that sx and lockmgr can have separate sleep
queues for shared and exclusive acquisitions

Submitted by: Attilio Rao
Approved by: jhb


# 164325 15-Nov-2006 pjd

Change sleepq_add(9) argument from 'struct mtx *' to 'struct lock_object *',
which allows to use it with different kinds of locks. For example it allows
to implement Solaris conditions variables which will be used in ZFS port on
top of sx(9) locks.

Reviewed by: jhb


# 155932 22-Feb-2006 davidxu

Fix a sleep queue race for KSE thread.

Reviewed by: jhb


# 155741 15-Feb-2006 davidxu

Fix a long standing race between sleep queue and thread
suspension code. When a thread A is going to sleep, it calls
sleepq_catch_signals() to detect any pending signals or thread
suspension request, if nothing happens, it returns without
holding process lock or scheduler lock, this opens a race
window which allows thread B to come in and do process
suspension work, however since A is still at running state,
thread B can do nothing to A, thread A continues, and puts
itself into actually sleeping state, but B has never seen it,
and it sits there forever until B is woken up by other threads
sometimes later(this can be very long delay or never
happen). Fix this bug by forcing sleepq_catch_signals to
return with scheduler lock held.
Fix sleepq_abort() by passing it an interrupted code, previously,
it worked as wakeup_one(), and the interruption can not be
identified correctly by sleep queue code when the sleeping
thread is resumed.
Let thread_suspend_check() returns EINTR or ERESTART, so sleep
queue no longer has to use SIGSTOP as a hack to build a return
value.

Reviewed by: jhb
MFC after: 1 week


# 153321 11-Dec-2005 rodrigc

Contributions from XFS for FreeBSD project:
- Implement cv_wait_unlock() method which has semantics compatible
with the sv_wait() method in IRIX. For cv_wait_unlock(), the lock
must be held before entering the function, but is not held when the
function is exited.

- Implement the existing cv_wait() function in terms of cv_wait_unlock().

Submitted by: kan
Feedback from: jhb, trhodes, Christoph Hellwig <hch at infradead dot org>


# 136445 12-Oct-2004 jhb

Refine the turnstile and sleep queue interfaces just a bit:
- Add a new _lock() call to each API that locks the associated chain lock
for a lock_object pointer or wait channel. The _lookup() functions now
require that the chain lock be locked via _lock() when they are called.
- Change sleepq_add(), turnstile_wait() and turnstile_claim() to lookup
the associated queue structure internally via _lookup() rather than
accepting a pointer from the caller. For turnstiles, this means that
the actual lookup of the turnstile in the hash table is only done when
the thread actually blocks rather than being done on each loop iteration
in _mtx_lock_sleep(). For sleep queues, this means that sleepq_lookup()
is no longer used outside of the sleep queue code except to implement an
assertion in cv_destroy().
- Change sleepq_broadcast() and sleepq_signal() to require that the chain
lock is already required. For condition variables, this lets the
cv_broadcast() and cv_signal() functions lock the sleep queue chain lock
while testing the waiters count. This means that the waiters count
internal to condition variables is no longer protected by the interlock
mutex and cv_broadcast() and cv_signal() now no longer require that the
interlock be held when they are called. This lets consumers of condition
variables drop the lock before waking other threads which can result in
fewer context switches.

MFC after: 1 month


# 134013 19-Aug-2004 jhb

Now that the return value semantics of cv's for multithreaded processes
have been unified with that of msleep(9), further refine the sleepq
interface and consolidate some duplicated code:
- Move the pre-sleep checks for theaded processes into a
thread_sleep_check() function in kern_thread.c.
- Move all handling of TDF_SINTR to be internal to subr_sleepqueue.c.
Specifically, if a thread is awakened by something other than a signal
while checking for signals before going to sleep, clear TDF_SINTR in
sleepq_catch_signals(). This removes a sched_lock lock/unlock combo in
that edge case during an interruptible sleep. Also, fix
sleepq_check_signals() to properly handle the condition if TDF_SINTR is
clear rather than requiring the callers of the sleepq API to notice
this edge case and call a non-_sig variant of sleepq_wait().
- Clarify the flags arguments to sleepq_add(), sleepq_signal() and
sleepq_broadcast() by creating an explicit submask for sleepq types.
Also, add an explicit SLEEPQ_MSLEEP type rather than a magic number of
0. Also, add a SLEEPQ_INTERRUPTIBLE flag for use with sleepq_add() and
move the setting of TDF_SINTR to sleepq_add() if this flag is set rather
than sleepq_catch_signals(). Note that it is the caller's responsibility
to ensure that sleepq_catch_signals() is called if and only if this flag
is passed to the preceeding sleepq_add(). Note that this also removes a
sched_lock lock/unlock pair from sleepq_catch_signals(). It also ensures
that for an interruptible sleep, TDF_SINTR is always set when
TD_ON_SLEEPQ() is true.


# 133440 10-Aug-2004 jhb

Synchronize the extra SA threading checks and return value handling of
condition variables with that of msleep().

Reviewed by: davidxu


# 131249 28-Jun-2004 jhb

Remove the signal_caught argument from sleepq_timedwait() as it was
effectively always zero.


# 127954 06-Apr-2004 jhb

Associate a simple count of waiters with each condition variable. The
count is protected by the mutex that protects the condition, so the count
does not require any extra locking or atomic operations. It serves as an
optimization to avoid calling into the sleepqueue code at all if there are
no waiters.

Note that the count can get temporarily out of sync when threads sleeping
on a condition variable time out or are aborted. However, it doesn't hurt
to call the sleepqueue code for either a signal or a broadcast when there
are no waiters, and the count is never out of sync in the opposite
direction unless we have more than INT_MAX sleeping threads.


# 126885 12-Mar-2004 jhb

- Remove old sleep queues.
- Remove sleepqueue argument from sleepq_set_timeout() since it is not
used.


# 126326 27-Feb-2004 jhb

Switch the sleep/wakeup and condition variable implementations to use the
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
and condition variables. Having sleep qeueus in a hash table avoids
having to allocate a queue head for each wait channel. Thus, struct cv
has shrunk down to just a single char * pointer now. However, the
hash table does not hold threads directly, but queue heads. This means
that once you have located a queue in the hash bucket, you no longer have
to walk the rest of the hash chain looking for threads. Instead, you have
a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
differentiates between cv's and sleep/wakeup. For example, calls to
abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and
cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
sleep's no longer inherently bump the priority. Instead, this is soley
a propery of msleep() which explicitly calls sched_prio() before
blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used. The
associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
dropped and replaced with a single explicit clearing of td_wchan.
TD_SET_ONSLEEPQ() would really have only made sense if it had taken
the wait channel and message as arguments anyway. Now that that only
happens in one place, a macro would be overkill.


# 124944 25-Jan-2004 jeff

- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or
SW_INVOL. Assert that one of these is set in mi_switch() and propery
adjust the rusage statistics. This is to simplify the large number of
users of this interface which were previously all required to adjust the
proper counter prior to calling mi_switch(). This also facilitates more
switch and locking optimizations.
- Change all callers of mi_switch() to pass the appropriate paramter and
remove direct references to the process statistics.


# 122352 09-Nov-2003 tanimura

- Implement selwakeuppri() which allows raising the priority of a
thread being waken up. The thread waken up can run at a priority as
high as after tsleep().

- Replace selwakeup()s with selwakeuppri()s and pass appropriate
priorities.

- Add cv_broadcastpri() which raises the priority of the broadcast
threads. Used by selwakeuppri() if collision occurs.

Not objected in: -arch, -current


# 117141 01-Jul-2003 davidxu

Allow SA process unblocks a thread blocked in condition variable.

Reviewed by: deischen


# 116182 10-Jun-2003 obrien

Use __FBSDID().


# 114983 13-May-2003 jhb

- Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.

Reviewed by: arch@
Approved by: re (rwatson)


# 113625 17-Apr-2003 jhb

Test the P_WEXIT flag while already hold the proc lock instead of right
after dropping it.


# 112887 31-Mar-2003 julian

Do NOT return from an non-interruptable cv_wait, falsely
claiming to have timed out. I don't know what I was thinking..


# 111883 04-Mar-2003 jhb

Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().


# 111605 27-Feb-2003 harti

When a process has been waiting on a condition variable or mutex the
td_wmesg field in the thread structure points to the description string of
the condition variable or mutex. If the condvar or the mutex had been
initialized from a loadable module that was unloaded in the meantime,
td_wmesg may now point to invalid memory. Retrieving the process table now
may panic the kernel (or access junk). Setting the td_wmesg field to NULL
after unblocking on the condvar/mutex prevents this panic.

PR: kern/47408
Approved by: jake (mentor)


# 109862 26-Jan-2003 jeff

- Call sched_sleep() instead of rolling our own in cv_waitq_add().


# 108338 27-Dec-2002 julian

Add code to ddb to allow backtracing an arbitrary thread.
(show thread {address})

Remove the IDLE kse state and replace it with a change in
the way threads sahre KSEs. Every KSE now has a thread, which is
considered its "owner" however a KSE may also be lent to other
threads in the same group to allow completion of in-kernel work.
n this case the owner remains the same and the KSE will revert to the
owner when the other work has been completed.

All creations of upcalls etc. is now done from
kse_reassign() which in turn is called from mi_switch or
thread_exit(). This means that special code can be removed from
msleep() and cv_wait().

kse_release() does not leave a KSE with no thread any more but
converts the existing thread into teh KSE's owner, and sets it up
for doing an upcall. It is just inhibitted from being scheduled until
there is some reason to do an upcall.

Remove all trace of the kse_idle queue since it is no-longer needed.
"Idle" KSEs are now on the loanable queue.


# 105911 25-Oct-2002 julian

More work on the interaction between suspending and sleeping threads.
Also clean up some code used with 'single-threading'.

Reviewed by: davidxu


# 104695 09-Oct-2002 julian

Round out the facilty for a 'bound' thread to loan out its KSE
in specific situations. The owner thread must be blocked, and the
borrower can not proceed back to user space with the borrowed KSE.
The borrower will return the KSE on the next context switch where
teh owner wants it back. This removes a lot of possible
race conditions and deadlocks. It is consceivable that the
borrower should inherit the priority of the owner too.
that's another discussion and would be simple to do.

Also, as part of this, the "preallocatd spare thread" is attached to the
thread doing a syscall rather than the KSE. This removes the need to lock
the scheduler when we want to access it, as it's now "at hand".

DDB now shows a lot mor info for threaded proceses though it may need
some optimisation to squeeze it all back into 80 chars again.
(possible JKH project)

Upcalls are now "bound" threads, but "KSE Lending" now means that
other completing syscalls can be completed using that KSE before the upcall
finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is
one of the highest priorities in the KSE system.) The upcall when it happens
will present all the completed syscalls to the KSE for selection.


# 103216 11-Sep-2002 julian

Completely redo thread states.

Reviewed by: davidxu@freebsd.org


# 102850 02-Sep-2002 davidxu

fix bogus CTR3 message.

Reviewed by: julian@freebsd.org (mentor)


# 102544 28-Aug-2002 peter

updatepri() works on a ksegrp (where the scheduling parameters are), so
directly give it the ksegrp instead of the thread. The only thing it used
to use in the thread was the ksegrp.

Reviewed by: julian


# 100971 30-Jul-2002 julian

Remove code that removes thread from sleep queue before
adding it to a condvar wait.
We do not have asleep() any more so this can not happen.


# 100921 30-Jul-2002 tanimura

In endtsleep() and cv_timedwait_end(), a thread marked TDF_TIMEOUT may
be swapped out. Do not put such the thread directly back to the run
queue.

Spotted by: David Xu <davidx@viasoft.com.cn>

While I am here, s/PS_TIMEOUT/TDF_TIMEOUT/.


# 100913 30-Jul-2002 tanimura

- Optimize wakeup() and its friends; if a thread waken up is being
swapped in, we do not have to ask for the scheduler thread to do
that.

- Assert that a process is not swapped out in runq functions and
swapout().

- Introduce thread_safetoswapout() for readability.

- In swapout_procs(), perform a test that may block (check of a
thread working on its vm map) first. This lets us call swapout()
with the sched_lock held, providing a better atomicity.


# 100884 29-Jul-2002 julian

Create a new thread state to describe threads that would be ready to run
except for the fact tha they are presently swapped out. Also add a process
flag to indicate that the process has started the struggle to swap
back in. This will be needed for the case where multiple threads
start the swapin action top a collision. Also add code to stop
a process fropm being swapped out if one of the threads in this
process is actually off running on another CPU.. that might hurt...

Submitted by: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>


# 100209 17-Jul-2002 gallatin

Allow alphas to do crashdumps: Refuse to run anything in choosethread()
after a panic which is not an interrupt thread, or the thread which
caused the panic. Also, remove panicstr checks from msleep() and from
cv_wait() in order to allow threads to go to sleep and yeild the cpu
to the panicing thread, or to an interrupt thread which might
be doing the crashdump.

Reviewed by: jhb (and it was mostly his idea too)


# 99247 02-Jul-2002 julian

Fix failure to correctly transition back to sleep mode.


# 99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


# 97995 07-Jun-2002 jhb

- Catch up to new ktrace API.
- ktrace trace points in msleep() and cv_wait() no longer need Giant.


# 97526 29-May-2002 julian

CURSIG() is not a macro so rename it cursig().

Obtained from: KSE tree


# 95322 23-Apr-2002 hsu

The cold and panicstr variables do not need to be protected by sched_lock.

Submitted by: Jennifer Yang (yangjihui@yahoo.com)
Reviewed by: jake & jhb in principle


# 93408 30-Mar-2002 dan

Nuke CV_DEBUG in favour of INVARIANTS.

Approved by: jhb


# 90538 11-Feb-2002 julian

In a threaded world, differnt priorirites become properties of
different entities. Make it so.

Reviewed by: jhb@freebsd.org (john baldwin)


# 88900 05-Jan-2002 jhb

Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe. Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer. This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs. Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called. (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha. I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine. PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken. Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by: peter
Tested on: i386, alpha


# 87594 10-Dec-2001 obrien

Update to C99, s/__FUNCTION__/__func__/.


# 83658 19-Sep-2001 peter

Add missing ; in last commit

Pointy-hat-to: jhb


# 83650 18-Sep-2001 jhb

Use a 'p' variable instead of repetitively indirecting td->td_proc for
signal things that are still per-process and won't be per-thread.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 82085 21-Aug-2001 jhb

- Fix a bug in the previous workaround for the tsleep/endtsleep race.
callout_stop() would fail in two cases:
1) The timeout was currently executing, and
2) The timeout had already executed.
We only needed to work around the race for 1). We caught some instances
of 2) via the PS_TIMEOUT flag, however, if endtsleep() fired after the
process had been woken up but before it had resumed execution,
PS_TIMEOUT would not be set, but callout_stop() would fail, so we
would block the process until endtsleep() resumed it. Except that
endtsleep() had already run and couldn't resume it. This adds a new flag
PS_TIMOFAIL to indicate the case of 2) when PS_TIMEOUT isn't set.
- Implement this race fix for condition variables as well.

Tested by: sos


# 79343 05-Jul-2001 jake

Backout mwakeup, etc.


# 79172 03-Jul-2001 jake

Implement mwakeup, mwakeup_one, cv_signal_drop and cv_broadcast_drop.
These take an additional mutex argument, which is dropped before any
processes are made runnable. This can avoid contention on the mutex
if the processes would immediately acquire it, and is done in such a
way that wakeups will not be lost.

Reviewed by: jhb


# 78637 22-Jun-2001 jhb

- Lock CURSIG with the proc lock and don't release the proc lock until
after grabbing the sched lock to close a race.
- Lock ktrace points with Giant.


# 76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# 74920 28-Mar-2001 jhb

Pass in a pointer to the mutex's lock_object as the second argument to
WITNESS_SLEEP() rather than the mutex itself.


# 74912 28-Mar-2001 jhb

Rework the witness code to work with sx locks as well as mutexes.
- Introduce lock classes and lock objects. Each lock class specifies a
name and set of flags (or properties) shared by all locks of a given
type. Currently there are three lock classes: spin mutexes, sleep
mutexes, and sx locks. A lock object specifies properties of an
additional lock along with a lock name and all of the extra stuff needed
to make witness work with a given lock. This abstract lock stuff is
defined in sys/lock.h. The lockmgr constants, types, and prototypes have
been moved to sys/lockmgr.h. For temporary backwards compatability,
sys/lock.h includes sys/lockmgr.h.
- Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin
locks held. By making this per-cpu, we do not have to jump through
magic hoops to deal with sched_lock changing ownership during context
switches.
- Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with
proc->p_sleeplocks, which is a list of held sleep locks including sleep
mutexes and sx locks.
- Add helper macros for logging lock events via the KTR_LOCK KTR logging
level so that the log messages are consistent.
- Add some new flags that can be passed to mtx_init():
- MTX_NOWITNESS - specifies that this lock should be ignored by witness.
This is used for the mutex that blocks a sx lock for example.
- MTX_QUIET - this is not new, but you can pass this to mtx_init() now
and no events will be logged for this lock, so that one doesn't have
to change all the individual mtx_lock/unlock() operations.
- All lock objects maintain an initialized flag. Use this flag to export
a mtx_initialized() macro that can be safely called from drivers. Also,
we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness
performs the corresponding checks using the initialized flag.
- The lock order reversal messages have been improved to output slightly
more accurate file and line numbers.


# 73925 07-Mar-2001 jhb

Use the proc lock to protect access to p_sigacts->ps_sigintr.


# 72376 11-Feb-2001 jake

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


# 72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 71557 24-Jan-2001 jhb

Catch up to P_FOO -> PS_FOO changes in proc flags.


# 71088 15-Jan-2001 jasone

Implement condition variables.