History log of /freebsd-9.3-release/lib/libthr/thread/thr_umtx.h
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 234372 17-Apr-2012 davidxu

Merge 233103, 233912 from head:

233103:
Some software think a mutex can be destroyed after it owned it, for
example, it uses a serialization point like following:
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
pthread_mutex_destroy(&muetx);
They think a previous lock holder should have already left the mutex and
is no longer referencing it, so they destroy it. To be maximum compatible
with such code, we use IA64 version to unlock the mutex in kernel, remove
the two steps unlocking code.

233912:
umtx operation UMTX_OP_MUTEX_WAKE has a side-effect that it accesses
a mutex after a thread has unlocked it, it event writes data to the mutex
memory to clear contention bit, there is a race that other threads
can lock it and unlock it, then destroy it, so it should not write
data to the mutex memory if there isn't any waiter.
The new operation UMTX_OP_MUTEX_WAKE2 try to fix the problem. It
requires thread library to clear the lock word entirely, then
call the WAKE2 operation to check if there is any waiter in kernel,
and try to wake up a thread, if necessary, the contention bit is set again
by the operation. This also mitgates the chance that other threads find
the contention bit and try to enter kernel to compete with each other
to wake up sleeping thread, this is unnecessary. With this change, the
mutex owner is no longer holding the mutex until it reaches a point
where kernel umtx queue is locked, it releases the mutex as soon as
possible.
Performance is improved when the mutex is contensted heavily. On Intel
i3-2310M, the runtime of a benchmark program is reduced from 26.87 seconds
to 2.39 seconds, it even is better than UMTX_OP_MUTEX_WAKE which is
deprecated now. http://people.freebsd.org/~davidxu/bench/mutex_perf.c

Special code for stable/9:
And add code to detect if the UMTX_OP_MUTEX_WAKE2 is available.


# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 216641 22-Dec-2010 davidxu

MFp4:

- Add flags CVWAIT_ABSTIME and CVWAIT_CLOCKID for umtx kernel based
condition variable, this should eliminate an extra system call to get
current time.

- Add sub-function UMTX_OP_NWAKE_PRIVATE to wake up N channels in single
system call. Create userland sleep queue for condition variable, in most
cases, thread will wait in the queue, the pthread_cond_signal will defer
thread wakeup until the mutex is unlocked, it tries to avoid an extra
system call and a extra context switch in time window of pthread_cond_signal
and pthread_mutex_unlock.

The changes are part of process-shared mutex project.


# 212077 01-Sep-2010 davidxu

Change atfork lock from mutex to rwlock, also make mutexes used by malloc()
module private type, when private type mutex is locked/unlocked, thread
critical region is entered or leaved. These changes makes fork()
async-signal safe which required by POSIX. Note that user's atfork handler
still needs to be async-signal safe, but it is not problem of libthr, it
is user's responsiblity.


# 212076 01-Sep-2010 davidxu

Add signal handler wrapper, the reason to add it becauses there are
some cases we want to improve:
1) if a thread signal got a signal while in cancellation point,
it is possible the TDP_WAKEUP may be eaten by signal handler
if the handler called some interruptibly system calls.
2) In signal handler, we want to disable cancellation.
3) When thread holding some low level locks, it is better to
disable signal, those code need not to worry reentrancy,
sigprocmask system call is avoided because it is a bit expensive.
The signal handler wrapper works in this way:
1) libthr installs its signal handler if user code invokes sigaction
to install its handler, the user handler is recorded in internal
array.
2) when a signal is delivered, libthr's signal handler is invoke,
libthr checks if thread holds some low level lock or is in critical
region, if it is true, the signal is buffered, and all signals are
masked, once the thread leaves critical region, correct signal
mask is restored and buffered signal is processed.
3) before user signal handler is invoked, cancellation is temporarily
disabled, after user signal handler is returned, cancellation state
is restored, and pending cancellation is rescheduled.


# 197445 23-Sep-2009 attilio

rwlock implemented from libthr need to fall through the 'hard path' and
query umtx also if the shared waiters bit is set on a shared lock.
The writer starvation avoidance technique, infact, can lead to shared
waiters on a shared lock which can bring to a missed wakeup and thus
to a deadlock if the right bit is not checked (a notable case is the
writers counterpart to be handled through expired timeouts).

Fix that by checking for the shared waiters bit also when unlocking the
shared locks.

That bug was causing a reported MySQL deadlock.
Many thanks go to Nick Esborn and his employer DesertNet which provided
time and machines to identify and fix this issue.

PR: thread/135673
Reported by: Nick Esborn <nick at desert dot net>
Tested by: Nick Esborn <nick at desert dot net>
Reviewed by: jeff


# 179970 24-Jun-2008 davidxu

Add two commands to _umtx_op system call to allow a simple mutex to be
locked and unlocked completely in userland. by locking and unlocking mutex
in userland, it reduces the total time a mutex is locked by a thread,
in some application code, a mutex only protects a small piece of code, the
code's execution time is less than a simple system call, if a lock contention
happens, however in current implemenation, the lock holder has to extend its
locking time and enter kernel to unlock it, the change avoids this disadvantage,
it first sets mutex to free state and then enters kernel and wake one waiter
up. This improves performance dramatically in some sysbench mutex tests.

Tested by: kris
Sounds great: jeff


# 178647 29-Apr-2008 davidxu

Use UMTX_OP_WAIT_UINT_PRIVATE and UMTX_OP_WAKE_PRIVATE to save
time in kernel(avoid VM lookup).


# 177850 02-Apr-2008 davidxu

Replace userland rwlock with a pure kernel based rwlock, the new
implementation does not switch pointers when it resumes waiters.

Asked by: jeff


# 173801 21-Nov-2007 davidxu

Remove umtx_t definition, use type long directly, add wrapper function
_thr_umtx_wait_uint() for umtx operation UMTX_OP_WAIT_UINT, use the
function in semaphore operations, this fixed compiler warnings.


# 165370 20-Dec-2006 davidxu

Check environment variable PTHREAD_ADAPTIVE_SPIN, if it is set, use
it as a default spin cycle count.


# 165206 14-Dec-2006 davidxu

Create inline function _thr_umutex_trylock2 to only try one atomic
operation, if it is failed, we call syscall directly, this saves
one atomic operation per lock contention.


# 164902 05-Dec-2006 davidxu

Add _thr_ucond_init().


# 164877 04-Dec-2006 davidxu

Use kernel provided userspace condition variable to implement pthread
condition variable.


# 163334 13-Oct-2006 davidxu

o Make _thr_umutex_init a function.
o Eliminate unused parameter for some functions.
o Convert type of first parameter to void * for _thr_umtx_wait
and _thr_umtx_wake.


# 162061 06-Sep-2006 davidxu

Replace internal usage of struct umtx with umutex which can supports
real-time if we want, no functionality is changed.


# 161680 28-Aug-2006 davidxu

Add umutex APIs.


# 153594 21-Dec-2005 davidxu

Hide umtx API symbols as well.


# 148470 28-Jul-2005 davidxu

Cast to uintptr_t to avoid compiler warning, it was broken by
the recent atomic_ptr() change.


# 144518 01-Apr-2005 davidxu

Import my recent 1:1 threading working. some features improved includes:
1. fast simple type mutex.
2. __thread tls works.
3. asynchronous cancellation works ( using signal ).
4. thread synchronization is fully based on umtx, mainly, condition
variable and other synchronization objects were rewritten by using
umtx directly. those objects can be shared between processes via
shared memory, it has to change ABI which does not happen yet.
5. default stack size is increased to 1M on 32 bits platform, 2M for
64 bits platform.
As the result, some mysql super-smack benchmarks show performance is
improved massivly.

Okayed by: jeff, mtm, rwatson, scottl