History log of /freebsd-10.1-release/sys/kern/kern_umtx.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 272461 02-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 270789 29-Aug-2014 kib

MFC r270345:
In do_lock_pi(), do not override error from umtxq_sleep_pi() when
doing suspend check.


# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 251684 13-Jun-2013 kib

Fix two issues with the spin loops in the umtx(2) implementation.

- When looping, check for the pending suspension. Otherwise, other
usermode thread which races with the looping one, could try to
prevent the process from stopping or exiting.

- Add missed checks for the faults from casuword*(). The code is
structured in a way which makes the loops exit if the specified
address is invalid, since both fuword() and casuword() return -1 on
the fault. But if the address is mapped readonly, the typical value
read by fuword() is different from -1, while casuword() returns -1.
Absent the checks for casuword() faults, this is interpreted as the
race with other thread and causes non-interruptible spinning in the
kernel.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 249644 19-Apr-2013 jilles

sem: Restart the POSIX sem_* calls after signals with SA_RESTART set.

Programs often do not expect an [EINTR] return from sem_wait() and POSIX
only allows it if the signal was installed without SA_RESTART. The timeout
in sem_timedwait() is absolute so it can be restarted normally.

The umtx call can be invoked with a relative timeout and in that case
[ERESTART] must be changed to [EINTR]. However, libc does not do this.

The old POSIX semaphore implementation did this correctly (before r249566),
unlike the new umtx one.

It may be desirable to avoid [EINTR] completely, which matches the pthread
functions and is explicitly permitted by POSIX. However, the kernel must
return [EINTR] at least for signals with SA_RESTART clear, otherwise pthread
cancellation will not abort a semaphore wait. In this commit, only restore
the 8.x behaviour which is also permitted by POSIX.

Discussed with: jhb
MFC after: 1 week


# 248591 21-Mar-2013 attilio

Fix a bug in UMTX_PROFILING:
UMTX_PROFILING should really analyze the distribution of locks as they
index entries in the umtxq_chains hash-table.
However, the current implementation does add/dec the length counters
for *every* thread insert/removal, measuring at all really userland
contention and not the hash distribution.

Fix this by correctly add/dec the length counters in the points where
it is really needed.

Please note that this bug brought us questioning in the past the quality
of the umtx hash table distribution.
To date with all the benchmarks I could try I was not able to reproduce
any issue about the hash distribution on umtx.

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff, davide
MFC after: 2 weeks


# 248105 09-Mar-2013 attilio

Improve UMTX_PROFILING:
- Use u_int values for length and max_length values
- Add a way to reset the max_length heuristic in order to have the
possibility to reuse the mechanism consecutively without rebooting
the machine
- Add a way to quick display top5 contented buckets in the system for
the max_length value.
This should give a quick overview on the quality of the hash table
distribution.

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff, davide


# 242202 27-Oct-2012 davide

The fields of struct timespec32 should be int32_t and not uint32_t.
Make this change.

Reviewed by: bde, davidxu
Tested by: pho
MFC after: 1 week


# 239202 11-Aug-2012 davidxu

Some style fixes inspired by @bde.


# 239187 10-Aug-2012 davidxu

tvtohz will print out an error message if a negative value is given
to it, avoid this problem by detecting timeout earlier.

Reported by: pho


# 234302 14-Apr-2012 davide

Fix some style bugs introduced in a previous commit (r233045)

Reported by: glebius, jmallet
Reviewed by: jmallet
Approved by: gnn (mentor)
MFC after: 2 days


# 233913 05-Apr-2012 davidxu

In sem_post, the field _has_waiters is no longer used, because some
application destroys semaphore after sem_wait returns. Just enter
kernel to wake up sleeping threads, only update _has_waiters if
it is safe. While here, check if the value exceed SEM_VALUE_MAX and
return EOVERFLOW if this is true.


# 233912 05-Apr-2012 davidxu

umtx operation UMTX_OP_MUTEX_WAKE has a side-effect that it accesses
a mutex after a thread has unlocked it, it event writes data to the mutex
memory to clear contention bit, there is a race that other threads
can lock it and unlock it, then destroy it, so it should not write
data to the mutex memory if there isn't any waiter.
The new operation UMTX_OP_MUTEX_WAKE2 try to fix the problem. It
requires thread library to clear the lock word entirely, then
call the WAKE2 operation to check if there is any waiter in kernel,
and try to wake up a thread, if necessary, the contention bit is set again
by the operation. This also mitgates the chance that other threads find
the contention bit and try to enter kernel to compete with each other
to wake up sleeping thread, this is unnecessary. With this change, the
mutex owner is no longer holding the mutex until it reaches a point
where kernel umtx queue is locked, it releases the mutex as soon as
possible.
Performance is improved when the mutex is contensted heavily. On Intel
i3-2310M, the runtime of a benchmark program is reduced from 26.87 seconds
to 2.39 seconds, it even is better than UMTX_OP_MUTEX_WAKE which is
deprecated now. http://people.freebsd.org/~davidxu/bench/mutex_perf.c


# 233729 31-Mar-2012 davidxu

Remove stale comments.


# 233700 30-Mar-2012 davidxu

Remove trailing semicolon, it is a typo.


# 233693 30-Mar-2012 davidxu

Fix COMPAT_FREEBSD32 build.

Submitted by: Andreas Tobler < andreast at fgznet dot ch >


# 233691 30-Mar-2012 davidxu

Remove trailing space.


# 233690 30-Mar-2012 davidxu

Merge umtxq_sleep and umtxq_nanosleep into a single function by using
an abs_timeout structure which describes timeout info.


# 233642 29-Mar-2012 davidxu

Reduce code size by creating common timed sleeping function.


# 233045 16-Mar-2012 davide

Add rudimentary profiling of the hash table used in the in the umtx code to
hold active lock queues.

Reviewed by: attilio
Approved by: davidxu, gnn (mentor)
MFC after: 3 weeks


# 232286 29-Feb-2012 davidxu

initialize clock ID and flags only when copying timespec, a _umtx_time
copy already contains these fields.


# 232209 27-Feb-2012 davidxu

Follow changes made in revision 232144, pass absolute timeout to kernel,
this eliminates a clock_gettime() syscall.


# 232144 25-Feb-2012 davidxu

In revision 231989, we pass a 16-bit clock ID into kernel, however
according to POSIX document, the clock ID may be dynamically allocated,
it unlikely will be in 64K forever. To make it future compatible, we
pack all timeout information into a new structure called _umtx_time, and
use fourth argument as a size indication, a zero means it is old code
using timespec as timeout value, but the new structure also includes flags
and a clock ID, so the size argument is different than before, and it is
non-zero. With this change, it is possible that a thread can sleep
on any supported clock, though current kernel code does not have such a
POSIX clock driver system.


# 231995 22-Feb-2012 davidxu

Fix typo.


# 231989 22-Feb-2012 davidxu

Use unused fourth argument of umtx_op to pass flags to kernel for operation
UMTX_OP_WAIT. Upper 16bits is enough to hold a clock id, and lower
16bits is used to pass flags. The change saves a clock_gettime() syscall
from libthr.


# 230194 16-Jan-2012 davidxu

Eliminate branch and insert an explicit reader memory barrier to ensure
that waiter bit is set before reading semaphore count.


# 228219 03-Dec-2011 pho

Add umtx_copyin_timeout() and move parameter checks here.

In collaboration with: kib
MFC after: 1 week


# 228218 03-Dec-2011 pho

Rename copyin_timeout32 to umtx_copyin_timeout32 and move parameter
check here. Include check for negative seconds value.

In collaboration with: kib
MFC after: 1 week


# 227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 218969 23-Feb-2011 jhb

Expose the umtx_key structure and API to the rest of the kernel.

MFC after: 3 days


# 216791 29-Dec-2010 davidxu

- Follow r216313, the sched_unlend_user_prio is no longer needed, always
use sched_lend_user_prio to set lent priority.
- Improve pthread priority-inherit mutex, when a contender's priority is
lowered, repropagete priorities, this may cause mutex owner's priority
to be lowerd, in old code, mutex owner's priority is rise-only.


# 216678 23-Dec-2010 davidxu

Enlarge hash table for new condition variable.


# 216641 22-Dec-2010 davidxu

MFp4:

- Add flags CVWAIT_ABSTIME and CVWAIT_CLOCKID for umtx kernel based
condition variable, this should eliminate an extra system call to get
current time.

- Add sub-function UMTX_OP_NWAKE_PRIVATE to wake up N channels in single
system call. Create userland sleep queue for condition variable, in most
cases, thread will wait in the queue, the pthread_cond_signal will defer
thread wakeup until the mutex is unlocked, it tries to avoid an extra
system call and a extra context switch in time window of pthread_cond_signal
and pthread_mutex_unlock.

The changes are part of process-shared mutex project.


# 216463 15-Dec-2010 mdf

One of the compat32 functions was copying in a raw timespec, instead of
a 32-bit one. This can cause weird timeout issues, as the copying reads
garbage from the user.

Code by: Deepak Veliath <deepak dot veliath at isilon dot com>
MFC after: 1 week


# 216313 09-Dec-2010 davidxu

MFp4:
It is possible a lower priority thread lending priority to higher priority
thread, in old code, it is ignored, however the lending should always be
recorded, add field td_lend_user_pri to fix the problem, if a thread does
not have borrowed priority, its value is PRI_MAX.

MFC after: 1 week


# 215652 22-Nov-2010 davidxu

Use atomic instruction to set _has_writer, otherwise there is a race
causes userland to not wake up a thread sleeping in kernel.

MFC after: 3 days


# 215336 15-Nov-2010 davidxu

Only unlock process if a thread is found.


# 213642 09-Oct-2010 davidxu

Create a global thread hash table to speed up thread lookup, use
rwlock to protect the table. In old code, thread lookup is done with
process lock held, to find a thread, kernel has to iterate through
process and thread list, this is quite inefficient.
With this change, test shows in extreme case performance is
dramatically improved.

Earlier patch was reviewed by: jhb, julian


# 211794 25-Aug-2010 davidxu

If a thread is removed from umtxq while sleeping, reset error code
to zero, this gives userland a better indication that a thread needn't
to be cancelled.


# 209390 21-Jun-2010 ed

Use ISO C99 integer types in sys/kern where possible.

There are only about 100 occurences of the BSD-specific u_int*_t
datatypes in sys/kern. The ISO C99 integer types are used here more
often.


# 205014 11-Mar-2010 nwhitehorn

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by: kib, jhb


# 203744 10-Feb-2010 davidxu

In function umtxq_insert_queue, use parameter q (shared/exclusive queue)
instead of hard coded constant. This does not affect RELENG_8 and previous,
because the code only exists in the HEAD.


# 203657 08-Feb-2010 davidxu

Set waiters flag before checking semaphore's counter,
otherwise we might lose a wakeup. Tested on postgresql database server.


# 203419 03-Feb-2010 davidxu

Fix comments in do_sem_wait().


# 203414 03-Feb-2010 davidxu

After busied the lock, re-read state word before checking waiters flag,
otherwise, the waiters bit may not be set and a wakeup is lost.

Submitted by: justin.teller at gmail dot com
MFC after: 3 days


# 201991 10-Jan-2010 davidxu

Make a chain be a list of queues, and make threads waiting
for same key coalesce to same queue, this makes searching
path shorter and improves performance.
Also fix comments about shared PI-mutex.


# 201887 09-Jan-2010 davidxu

Use enum to define key types.

Suggested by: jmallett


# 201886 09-Jan-2010 davidxu

put semaphore waiter in long term list.


# 201885 09-Jan-2010 davidxu

Add key type TYPE_SEM.


# 201472 04-Jan-2010 davidxu

Add user-level semaphore synchronous type, this change allows multiple
processes to share semaphore by using shared memory area, in simplest case,
only one atomic operation is needed in userland, waiter flag is maintained by
kernel and userland only checks the flag, if the flag is set, user code enters
kernel and does a wakeup() call.
Move type definitions into file _umtx.h to minimize compiling time.
Also type names need to be prefixed with underline character, this would reduce
name conflict (still in progress).


# 197476 24-Sep-2009 davidxu

In function do_rw_wrlock, when a writer got an error and before returning,
check if there are readers blocked by us via URWLOCK_WRITE_WAITERS flag,
and resume the readers. The error must be EAGAIN, otherwise there must
have memory problem, and nobody can rescue the buggy application.

The revision 197445 might be reverted.


# 190987 13-Apr-2009 davidxu

Make UMTX_OP_WAIT_UINT actually wait for an unsigned integer on 64-bits
machine.

MFC after: 1 week


# 189756 13-Mar-2009 davidxu

1) Check NULL pointer before calling umtx_pi_adjust_locked(), this avoids
a PANIC.
2) Rework locking for POSIX priority-mutex, this fixes a
race where a thread may wait there forever even if the mutex is unlocked.


# 179970 24-Jun-2008 davidxu

Add two commands to _umtx_op system call to allow a simple mutex to be
locked and unlocked completely in userland. by locking and unlocking mutex
in userland, it reduces the total time a mutex is locked by a thread,
in some application code, a mutex only protects a small piece of code, the
code's execution time is less than a simple system call, if a lock contention
happens, however in current implemenation, the lock holder has to extend its
locking time and enter kernel to unlock it, the change avoids this disadvantage,
it first sets mutex to free state and then enters kernel and wake one waiter
up. This improves performance dramatically in some sysbench mutex tests.

Tested by: kris
Sounds great: jeff


# 179421 30-May-2008 davidxu

Use a seperated hash table for mutex and rwlock, avoid wasting some time
on walking through idle threads sleeping on condition variables.


# 178646 29-Apr-2008 davidxu

Introduce command UMTX_OP_WAIT_UINT_PRIVATE and UMTX_OP_WAKE_PRIVATE
to allow userland to specify that an address is not shared by multiple
processes.


# 177880 03-Apr-2008 davidxu

let umtxq_busy() only spin on mp machine. make function name
do_rwlock_unlock to be consistent with others.


# 177852 02-Apr-2008 davidxu

Fix compiling problem for amd64.


# 177849 02-Apr-2008 davidxu

Er, don't restart a timeout version.


# 177848 02-Apr-2008 davidxu

Introduce kernel based userland rwlock. Each umtx chain now has two lists,
one for readers and one for writers, other types of synchronization
object just use first list.

Asked by: jeff


# 174707 17-Dec-2007 davidxu

Check NULL pointer.


# 174701 17-Dec-2007 davidxu

Add missing changes for fixing LOR of umtx lock and thread lock, follow
the committing of files:
kern_resource.c revision 1.181
sched_4bsd.c revision 1.111
sched_ule.c revision 1.218


# 173800 21-Nov-2007 davidxu

Add function UMTX_OP_WAIT_UINT, the function causes thread to wait for
an integer to be changed.


# 170368 06-Jun-2007 davidxu

Backout experimental adaptive-spin umtx code.


# 170300 04-Jun-2007 jeff

Commit 8/14 of sched_lock decomposition.
- Use a global umtx spinlock to protect the sleep queues now that there
is no global scheduler lock.
- Use thread_lock() to protect thread state.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 167232 05-Mar-2007 rwatson

Further system call comment cleanup:

- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
"syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.


# 165369 20-Dec-2006 davidxu

Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
kern.threads.umtx_dflt_spins
if the sysctl value is non-zero, a zero umutex.m_spincount will
cause the sysctl value to be used a spin cycle count.
kern.threads.umtx_max_spins
the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130


# 164936 06-Dec-2006 julian

Threading cleanup.. part 2 of several.

Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.


# 164876 04-Dec-2006 davidxu

if a thread blocked on userland condition variable is
pthread_cancel()ed, it is expected that the thread will not
consume a pthread_cond_signal(), therefor, we use thr_wake()
to mark a flag, the flag tells a thread calling do_cv_wait()
in umtx code to not block on a condition variable.
Thread library is expected that once a thread detected itself
is in pthread_cond_wait, it will call the thr_wake() for itself
in its SIGCANCEL handler.


# 164839 02-Dec-2006 davidxu

Introduce userspace condition variable, since we have already POSIX
priority mutex implemented, it is the time to introduce this stuff,
now we can use umutex and ucond together to implement pthread's
condition wait/signal.


# 164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 163709 26-Oct-2006 jb

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


# 163697 26-Oct-2006 davidxu

Optimize umtx_lock_pi() a bit by moving some heavy code out of the loop,
make a fast path when a umtx_pi can be allocated without being blocked.


# 163678 25-Oct-2006 davidxu

In order to eliminate a branch, convert opcode to unsigned integer.


# 163677 25-Oct-2006 davidxu

Eliminate an unnecessary `if' statement.


# 163449 17-Oct-2006 davidxu

o Add keyword volatile for user mutex owner field.
o Fix type consistent problem by using type long for old
umtx and wait channel.
o Rename casuptr to casuword.


# 163046 06-Oct-2006 davidxu

Implement 32bit umtx_lock and umtx_unlock system calls, these two system
calls are not used by libthr in RELENG_6 and HEAD, it is only used by
the libthr in RELENG-5, the _umtx_op system call can do more incremental
dirty works than these two system calls without having to introduce new
system calls or throw away old system calls when things are going on.


# 162550 22-Sep-2006 davidxu

Fix umtx command order error for freebsd 32bit.


# 162536 21-Sep-2006 davidxu

Add umtx support for 32bit process on AMD64 machine.


# 162030 05-Sep-2006 davidxu

Merge all code of do_lock_normal, do_lock_pi and do_lock_pp into
function do_lock_umutex.


# 161926 02-Sep-2006 davidxu

Check if it is root user in do_unlock_pp.


# 161855 02-Sep-2006 davidxu

Make sure we get new m_owner value if we can not unlock it in
uncontested case. Reorder statements in do_unlock_umutex.


# 161742 30-Aug-2006 davidxu

Reorder some statments. Fix typo and remove stale comments.


# 161684 28-Aug-2006 davidxu

Update comments about interrupted mutex locking.


# 161678 28-Aug-2006 davidxu

This is initial version of POSIX priority mutex support, a new userland
mutex structure is added as following:
struct umutex {
__lwpid_t m_owner;
uint32_t m_flags;
uint32_t m_ceilings[2];
uint32_t m_spare[4];
};
The m_owner represents owner thread, it is a thread id, in non-contested
case, userland can simply use atomic_cmpset_int to lock the mutex, if the
mutex is contested, high order bit will be set, and userland should do locking
and unlocking via kernel syscall. Flag UMUTEX_PRIO_INHERIT represents
pthread's PTHREAD_PRIO_INHERIT mutex, which when contention happens, kernel
should do priority propagating. Flag UMUTEX_PRIO_PROTECT indicates it is
pthread's PTHREAD_PRIO_PROTECT mutex, userland should initialize m_owner
to contested state UMUTEX_CONTESTED, then atomic_cmpset_int will be failure
and kernel syscall should be invoked to do locking, this becauses
for such a mutex, kernel should always boost the thread's priority before
it can lock the mutex, m_ceilings is used by PTHREAD_PRIO_PROTECT mutex,
the first element is used to boost thread's priority when it locked the mutex,
second element is used when the mutex is unlocked, the PTHREAD_PRIO_PROTECT
mutex's link list is kept in userland, the m_ceiling[1] is managed by thread
library so kernel needn't allocate memory to keep the link list, when such
a mutex is unlocked, kernel reset m_owner to UMUTEX_CONTESTED.
Flag USYNC_PROCESS_SHARED indicate if the synchronization object is process
shared, if the flag is not set, it saves a vm_map_lookup() call.

The umtx chain is still used as a sleep queue, when a thread is blocked on
PTHREAD_PRIO_INHERIT mutex, a umtx_pi is allocated to support priority
propagating, it is dynamically allocated and reference count is used,
it is not optimized but works well in my tests, while the umtx chain has
its own locking protocol, the priority propagating protocol are all protected
by sched_lock because priority propagating function is called with sched_lock
held from scheduler.

No visible performance degradation is found which these changes. Some parameter
names in _umtx_op syscall are renamed.


# 161599 25-Aug-2006 davidxu

Add user priority loaning code to support priority propagation for
1:1 threading's POSIX priority mutexes, the code is no-op unless
priority-aware umtx code is committed.


# 158718 18-May-2006 davidxu

Move flag TDF_UMTXQ into structure umtxq, this eliminates the requirement
of scheduler lock in some umtx code.


# 158377 09-May-2006 davidxu

Use wakeup_one to avoid thundering herd.

Tested by: kris


# 157815 17-Apr-2006 jhb

Change msleep() and tsleep() to not alter the calling thread's priority
if the specified priority is zero. This avoids a race where the calling
thread could read a snapshot of it's current priority, then a different
thread could change the first thread's priority, then the original thread
would call sched_prio() inside msleep() undoing the change made by the
second thread. I used a priority of zero as no thread that calls msleep()
or tsleep() should be specifying a priority of zero anyway.

The various places that passed 'curthread->td_priority' or some variant
as the priority now pass 0.


# 155276 04-Feb-2006 davidxu

Axe unused code.


# 151692 26-Oct-2005 davidxu

do umtx_wake at userland thread exit address, so that others userland
threads can wait for a thread to exit, and safely assume that the thread
has left userland and is no longer using its userland stack, this is
necessary for pthread_join when a thread is waiting for another thread
to exit which has user customized stack, after pthread_join returns,
the userland stack can be reused for other purposes, without this change,
the joiner thread has to spin at the address to ensure the thread is really
exited.


# 143149 05-Mar-2005 davidxu

Allocate umtx_q from heap instead of stack, this avoids
page fault panic in kernel under heavy swapping.


# 140421 18-Jan-2005 davidxu

Revert my previous errno hack, that is certainly an issue,
and always has been, but the system call itself returns
errno in a register so the problem is really a function of
libc, not the system call.

Discussed with : Matthew Dillion <dillon@apollo.backplane.com>


# 140245 14-Jan-2005 davidxu

make umtx timeout relative so userland can select different clock type,
e.g, CLOCK_REALTIME or CLOCK_MONOTONIC.
merge umtx_wait and umtx_timedwait into single function.


# 140110 12-Jan-2005 phk

Comment out debugging printf which doesn't compile on amd64.


# 140102 12-Jan-2005 davidxu

Let _umtx_op directly return error code rather than from errno because
errno can be tampered potentially by nested signal handle.
Now all error codes are returned in negative value, positive value are
reserved for future expansion.


# 139899 08-Jan-2005 davidxu

Break out of loop earlier if it is not timeout.


# 139804 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


# 139751 06-Jan-2005 davidxu

Return ETIMEDOUT when thread is timeouted since POSIX thread
APIs expect ETIMEDOUT not EAGAIN, this simplifies userland code a
bit.


# 139427 30-Dec-2004 davidxu

Make umtx_wait and umtx_wake more like linux futex does, it is
more general than previous. It also lets me implement cancelable point
in thread library. Also in theory, umtx_lock and umtx_unlock can
be implemented by using umtx_wait and umtx_wake, all atomic operations
can be done in userland without kernel's casuptr() function.


# 139292 25-Dec-2004 davidxu

Make _umtx_op() as more general interface, the final parameter needn't be
timespec pointer, every parameter will be interpreted by its opcode.


# 139291 25-Dec-2004 davidxu

1. introduce umtx_owner to get an owner of a umtx.
2. add const qualifier to umtx_timedlock and umtx_timedwait.
3. add missing blackets in umtx do_unlock_and_wait.


# 139258 24-Dec-2004 davidxu

Add umtxq_lock/unlock around umtx_signal, fix debug kernel compiling,
let umtx_lock returns EINTR when it returns ERESTART, this lets
userland have chance to back off mtx lock code when needed.


# 139257 24-Dec-2004 davidxu

1. Fix race condition between umtx lock and unlock, heavy testing
on SMP can explore the bug.
2. Let umtx_wake returns number of threads have been woken.


# 139014 18-Dec-2004 davidxu

1. msleep returns EWOULDBLOCK not ETIMEDOUT, use EWOULDBLOCK instead.
2. Eliminate a possible lock leak in timed wait loop.


# 139013 18-Dec-2004 davidxu

1. make umtx sharable between processes, the way is two or more processes
call mmap() to create a shared space, and then initialize umtx on it,
after that, each thread in different processes can use the umtx same
as threads in same process.
2. introduce a new syscall _umtx_op to support timed lock and condition
variable semantics. also, orignal umtx_lock and umtx_unlock inline
functions now are reimplemented by using _umtx_op, the _umtx_op can
use arbitrary id not just a thread id.


# 138225 30-Nov-2004 davidxu

Forgot to inline umtxq_unlock.


# 138224 30-Nov-2004 davidxu

1. use per-chain mutex instead of global mutex to reduce
lock collision.
2. Fix two race conditions. One is between _umtx_unlock and signal,
also a thread was marked TDF_UMTXWAKEUP by _umtx_unlock, it is
possible a signal delivered to the thread will cause msleep
returns EINTR, and the thread breaks out of loop, this causes
umtx ownership is not transfered to the thread. Another is in
_umtx_unlock itself, when the function sets the umtx to
UMTX_UNOWNED state, a new thread can come in and lock the umtx,
also the function tries to set contested bit flag, but it will
fail. Although the function will wake a blocked thread, if that
thread breaks out of loop by signal, no contested bit will be set.


# 132039 12-Jul-2004 mtm

writers must hold both sched_lock and the process lock; therefore, readers
need only obtain the process lock.


# 131431 01-Jul-2004 marcel

Change the thread ID (thr_id_t) used for 1:1 threading from being a
pointer to the corresponding struct thread to the thread ID (lwpid_t)
assigned to that thread. The primary reason for this change is that
libthr now internally uses the same ID as the debugger and the kernel
when referencing to a kernel thread. This allows us to implement the
support for debugging without additional translations and/or mappings.

To preserve the ABI, the 1:1 threading syscalls, including the umtx
locking API have not been changed to work on a lwpid_t. Instead the
1:1 threading syscalls operate on long and the umtx locking API has
not been changed except for the contested bit. Previously this was
the least significant bit. Now it's the most significant bit. Since
the contested bit should not be tested by userland, this change is
not expected to be visible. Just to be sure, UMTX_CONTESTED has been
removed from <sys/umtx.h>.

Reviewed by: mtm@
ABI preservation tested on: i386, ia64


# 127483 27-Mar-2004 mtm

Use the proc lock to sleep on a libthr umtx.


# 119836 07-Sep-2003 tjr

Return EINVAL if the contested bit is not set on the umtx passed to
_umtx_unlock() instead of firing a KASSERT.


# 117938 23-Jul-2003 peter

Initialize 'blocked' to NULL. I think this was a real problem, but I
am not sure about that. The lack of -Werror and the inline noise hid
this for a while.


# 117778 19-Jul-2003 mtm

Turn a KASSERT back into an EINVAL return value. So, next time someone
comes across it, it will turn into a core dump in userland instead of
a kernel panic. I had also inverted the sense of the test, so

Double pointy hat to: mtm


# 117745 18-Jul-2003 mtm

Remove a lock held across casuptr() that snuck in last commit.


# 117743 18-Jul-2003 mtm

Move the decision on whether to unset the contested
bit or not from lock to unlock time.

Suggested by: jhb


# 117685 17-Jul-2003 mtm

Fix umtx locking, for libthr, in the kernel.
1. There was a race condition between a thread unlocking
a umtx and the thread contesting it. If the unlocking
thread won the race it may try to wakeup a thread that
was not yet in msleep(). The contesting thread would then
go to sleep to await a wakeup that would never come. It's
not possible to close the race by using a lock because
calls to casuptr() may have to fault a page in from swap.
Instead, the race was closed by introducing a flag that
the unlocking thread will set when waking up a thread.
The contesting thread will check for this flag before
going to sleep. For now the flag is kept in td_flags,
but it may be better to use some other member or create
a new one because of the possible performance/contention
issues of having to own sched_lock. Thanks to jhb for
pointing me in the right direction on this one.

2. Once a umtx was contested all future locks and unlocks
were happening in the kernel, regardless of whether it
was contested or not. To prevent this from happening,
when a thread locks a umtx it checks the queue for that
umtx and unsets the contested bit if there are no other
threads waiting on it. Again, this is slightly more
complicated than it needs to be because we can't hold
a lock across casuptr(). So, the thread has to check
the queue again after unseting the bit, and reset the
contested bit if it finds that another thread has put
itself on the queue in the mean time.

3. Remove the if... block for unlocking an uncontested
umtx, and replace it with a KASSERT. The _only_ time
a thread should be unlocking a umtx in the kernel is
if it is contested.


# 117244 04-Jul-2003 mtm

I was so happy I found the semi-colon from hell that I didn't
notice another typo in the same line. This typo makes libthr unuseable,
but it's effects where counter-balanced by the extra semicolon, which
made libthr remarkably useable for the past several months.


# 117219 04-Jul-2003 mtm

It's unfair how one extraneous semi-colon can cause so much grief.


# 116182 10-Jun-2003 obrien

Use __FBSDID().


# 115765 03-Jun-2003 jeff

- Remove the blocked pointer from the umtx structure.
- Use a hash of umtx queues to queue blocked threads. We hash on pid and the
virtual address of the umtx structure. This eliminates cases where we
previously held a lock across a casuptr call.

Reviwed by: jhb (quickly)


# 115310 25-May-2003 jeff

- Create a new lock, umtx_lock, for use instead of the proc lock for
protecting the umtx queues. We can't use the proc lock because we need
to hold the lock across calls to casuptr, which can fault.

Approved by: re


# 112967 02-Apr-2003 jake

- Make casuptr return the old value of the location we're trying to update,
and change the umtx code to expect this.

Reviewed by: jeff


# 112904 31-Mar-2003 jeff

- Add an api for doing smp safe locks in userland.
- umtx_lock() is defined as an inline in umtx.h. It tries to do an
uncontested acquire of a lock which falls back to the _umtx_lock()
system-call if that fails.
- umtx_unlock() is also an inline which falls back to _umtx_unlock() if the
uncontested unlock fails.
- Locks are keyed off of the thr_id_t of the currently running thread which
is currently just the pointer to the 'struct thread' in kernel.
- _umtx_lock() uses the proc pointer to synchronize access to blocked thread
queues which are stored in the first blocked thread.