Cross Reference: /seL4-refos-master/libs/libmuslc/src/internal/pthread

History log of /seL4-refos-master/libs/libmuslc/src/internal/pthread_impl.h
Revision	Date	Author	Comments
# f548fc36	02-Apr-2018	Kent McLeod <Kent.Mcleod@data61.csiro.au>	RISC-V: Revert pthread_impl definitions
# 2435f79d	30-Jul-2016	Hesham Almatary <hesham.almatary@data61.csiro.au>	RISC-V port
# 6f186676	06-Dec-2016	Rich Felker <dalias@aerifal.cx>	remove largish unused field from pthread structure
# 4078a5c3	08-Nov-2016	Rich Felker <dalias@aerifal.cx>	fix build regression on archs with variable page size commit 31fb174dd295e50f7c5cf18d31fcfd5fe5a063b7 used DEFAULT_GUARD_SIZE from pthread_impl.h in a static initializer, breaking build on archs where its definition, PAGE_SIZE, is not a constant. instead, just define DEFAULT_GUARD_SIZE as 4096, the minimal page size on any arch we support. pthread_create rounds up to whole pages anyway, so defining it to 1 would also work, but a moderately meaningful value is nicer to programs that use pthread_attr_getguardsize on default-initialized attribute objects.
# 6ba5517a	25-Jun-2015	Rich Felker <dalias@aerifal.cx>	fix local-dynamic model TLS on mips and powerpc the TLS ABI spec for mips, powerpc, and some other (presently unsupported) RISC archs has the return value of __tls_get_addr offset by +0x8000 and the result of DTPOFF relocations offset by -0x8000. I had previously assumed this part of the ABI was actually just an implementation detail, since the adjustments cancel out. however, when the local dynamic model is used for accessing TLS that's known to be in the same DSO, either of the following may happen: 1. the -0x8000 offset may already be applied to the argument structure passed to __tls_get_addr at ld time, without any opportunity for runtime relocations. 2. __tls_get_addr may be used with a zero offset argument to obtain a base address for the module's TLS, to which the caller then applies immediate offsets for individual objects accessed using the local dynamic model. since the immediate offsets have the -0x8000 adjustment applied to them, the base address they use needs to include the +0x8000 offset. it would be possible, but more complex, to store the pointers in the dtv[] array with the +0x8000 offset pre-applied, to avoid the runtime cost of adding 0x8000 on each call to __tls_get_addr. this change could be made later if measurements show that it would help.
# 484194db	06-May-2015	Rich Felker <dalias@aerifal.cx>	fix stack protector crashes on x32 & powerpc due to misplaced TLS canary i386, x86_64, x32, and powerpc all use TLS for stack protector canary values in the default stack protector ABI, but the location only matched the ABI on i386 and x86_64. on x32, the expected location for the canary contained the tid, thus producing spurious mismatches (resulting in process termination) upon fork. on powerpc, the expected location contained the stdio_locks list head, so returning from a function after calling flockfile produced spurious mismatches. in both cases, the random canary was not present, and a predictable value was used instead, making the stack protector hardening much less effective than it should be. in the current fix, the thread structure has been expanded to have canary fields at all three possible locations, and archs that use a non-default location must define a macro in pthread_arch.h to choose which location is used. for most archs (which lack TLS canary ABI) the choice does not matter.
# 01d42747	18-Apr-2015	Rich Felker <dalias@aerifal.cx>	make dlerror state and message thread-local and dynamically-allocated this fixes truncation of error messages containing long pathnames or symbol names. the dlerror state was previously required by POSIX to be global. the resolution of bug 97 relaxed the requirements to allow thread-safe implementations of dlerror with thread-local state and message buffer.
# fa807876	14-Apr-2015	Alexander Monakov <amonakov@ispras.ru>	add missing 'void' in prototypes of internal pthread functions
# f08ab9e6	10-Apr-2015	Rich Felker <dalias@aerifal.cx>	redesign and simplify vmlock system this global lock allows certain unlock-type primitives to exclude mmap/munmap operations which could change the identity of virtual addresses while references to them still exist. the original design mistakenly assumed mmap/munmap would conversely need to exclude the same operations which exclude mmap/munmap, so the vmlock was implemented as a sort of 'symmetric recursive rwlock'. this turned out to be unnecessary. commit 25d12fc0fc51f1fae0f85b4649a6463eb805aa8f already shortened the interval during which mmap/munmap held their side of the lock, but left the inappropriate lock design and some inefficiency. the new design uses a separate function, __vm_wait, which does not hold any lock itself and only waits for lock users which were already present when it was called to release the lock. this is sufficient because of the way operations that need to be excluded are sequenced: the "unlock-type" operations using the vmlock need only block mmap/munmap operations that are precipitated by (and thus sequenced after) the atomic-unlock they perform while holding the vmlock. this allows for a spectacular lack of synchronization in the __vm_wait function itself.
# 204a69d2	10-Mar-2015	Szabolcs Nagy <nsz@port70.net>	copy the dtv pointer to the end of the pthread struct for TLS_ABOVE_TP archs There are two main abi variants for thread local storage layout: (1) TLS is above the thread pointer at a fixed offset and the pthread struct is below that. So the end of the struct is at known offset. (2) the thread pointer points to the pthread struct and TLS starts below it. So the start of the struct is at known (zero) offset. Assembly code for the dynamic TLSDESC callback needs to access the dynamic thread vector (dtv) pointer which is currently at the front of the pthread struct. So in case of (1) the asm code needs to hard code the offset from the end of the struct which can easily break if the struct changes. This commit adds a copy of the dtv at the end of the struct. New members must not be added after dtv_copy, only before it. The size of the struct is increased a bit, but there is opportunity for size optimizations.
# 56fbaa3b	03-Mar-2015	Rich Felker <dalias@aerifal.cx>	make all objects used with atomic operations volatile the memory model we use internally for atomics permits plain loads of values which may be subject to concurrent modification without requiring that a special load function be used. since a compiler is free to make transformations that alter the number of loads or the way in which loads are performed, the compiler is theoretically free to break this usage. the most obvious concern is with atomic cas constructs: something of the form tmp=p;a_cas(p,tmp,f(tmp)); could be transformed to a_cas(p,p,f(p)); where the latter is intended to show multiple loads of p whose resulting values might fail to be equal; this would break the atomicity of the whole operation. but even more fundamental breakage is possible. with the changes being made now, objects that may be modified by atomics are modeled as volatile, and the atomic operations performed on them by other threads are modeled as asynchronous stores by hardware which happens to be acting on the request of another thread. such modeling of course does not itself address memory synchronization between cores/cpus, but that aspect was already handled. this all seems less than ideal, but it's the best we can do without mandating a C11 compiler and using the C11 model for atomics. in the case of pthread_once_t, the ABI type of the underlying object is not volatile-qualified. so we are assuming that accessing the object through a volatile-qualified lvalue via casts yields volatile access semantics. the language of the C standard is somewhat unclear on this matter, but this is an assumption the linux kernel also makes, and seems to be the correct interpretation of the standard.
# 0fc317d8	02-Mar-2015	Rich Felker <dalias@aerifal.cx>	factor cancellation cleanup push/pop out of futex __timedwait function previously, the __timedwait function was optionally a cancellation point depending on whether it was passed a pointer to a cleaup function and context to register. as of now, only one caller actually used such a cleanup function (and it may face removal soon); most callers either passed a null pointer to disable cancellation or a dummy cleanup function. now, __timedwait is never a cancellation point, and __timedwait_cp is the cancellable version. this makes the intent of the calling code more obvious and avoids ugly dummy functions and long argument lists.
# 23614b0f	07-Sep-2014	Rich Felker <dalias@aerifal.cx>	add C11 thread creation and related thread functions based on patch by Jens Gustedt. the main difficulty here is handling the difference between start function signatures and thread return types for C11 threads versus POSIX threads. pointers to void are assumed to be able to represent faithfully all values of int. the function pointer for the thread start function is cast to an incorrect type for passing through pthread_create, but is cast back to its correct type before calling so that the behavior of the call is well-defined. changes to the existing threads implementation were kept minimal to reduce the risk of regressions, and duplication of code that carries implementation-specific assumptions was avoided for ease and safety of future maintenance.
# 5345c9b8	23-Aug-2014	Rich Felker <dalias@aerifal.cx>	fix false ownership of stdio FILEs due to tid reuse this is analogous commit fffc5cda10e0c5c910b40f7be0d4fa4e15bb3f48 which fixed the corresponding issue for mutexes. the robust list can't be used here because the locks do not share a common layout with mutexes. at some point it may make sense to simply incorporate a mutex object into the FILE structure and use it, but that would be a much more invasive change, and it doesn't mesh well with the current design that uses a simpler code path for internal locking and pulls in the recursive-mutex-like code when the flockfile API is used explicitly.
# b8ca9eb5	22-Aug-2014	Rich Felker <dalias@aerifal.cx>	fix fallback checks for kernels without private futex support for unknown syscall commands, the kernel produces ENOSYS, not EINVAL.
# 37195db8	17-Aug-2014	Rich Felker <dalias@aerifal.cx>	redesign cond var implementation to fix multiple issues the immediate issue that was reported by Jens Gustedt and needed to be fixed was corruption of the cv/mutex waiter states when switching to using a new mutex with the cv after all waiters were unblocked but before they finished returning from the wait function. self-synchronized destruction was also handled poorly and may have had race conditions. and the use of sequence numbers for waking waiters admitted a theoretical missed-wakeup if the sequence number wrapped through the full 32-bit space. the new implementation is largely documented in the comments in the source. the basic principle is to use linked lists initially attached to the cv object, but detachable on signal/broadcast, made up of nodes residing in automatic storage (stack) on the threads that are waiting. this eliminates the need for waiters to access the cv object after they are signaled, and allows us to limit wakeup to one waiter at a time during broadcasts even when futex requeue cannot be used. performance is also greatly improved, roughly double some tests. basically nothing is changed in the process-shared cond var case, where this implementation does not work, since processes do not have access to one another's local storage.
# de7e99c5	16-Aug-2014	Rich Felker <dalias@aerifal.cx>	make pointers used in robust list volatile when manipulating the robust list, the order of stores matters, because the code may be asynchronously interrupted by a fatal signal and the kernel will then access the robust list in what is essentially an async-signal context. previously, aliasing considerations made it seem unlikely that a compiler could reorder the stores, but proving that they could not be reordered incorrectly would have been extremely difficult. instead I've opted to make all the pointers used as part of the robust list, including those in the robust list head and in the individual mutexes, volatile. in addition, the format of the robust list has been changed to point back to the head at the end, rather than ending with a null pointer. this is to match the documented kernel robust list ABI. the null pointer, which was previously used, only worked because faults during access terminate the robust list processing.
# bc09d58c	15-Aug-2014	Rich Felker <dalias@aerifal.cx>	make futex operations use private-futex mode when possible private-futex uses the virtual address of the futex int directly as the hash key rather than requiring the kernel to resolve the address to an underlying backing for the mapping in which it lies. for certain usage patterns it improves performance significantly. in many places, the code using futex __wake and __wait operations was already passing a correct fixed zero or nonzero flag for the priv argument, so no change was needed at the site of the call, only in the __wake and __wait functions themselves. in other places, especially where the process-shared attribute for a synchronization object was not previously tracked, additional new code is needed. for mutexes, the only place to store the flag is in the type field, so additional bit masking logic is needed for accessing the type. for non-process-shared condition variable broadcasts, the futex requeue operation is unable to requeue from a private futex to a process-shared one in the mutex structure, so requeue is simply disabled in this case by waking all waiters. for robust mutexes, the kernel always performs a non-private wake when the owner dies. in order not to introduce a behavioral regression in non-process-shared robust mutexes (when the owning thread dies), they are simply forced to be treated as process-shared for now, giving correct behavior at the expense of performance. this can be fixed by adding explicit code to pthread_exit to do the right thing for non-shared robust mutexes in userspace rather than relying on the kernel to do it, and will be fixed in this way later. since not all supported kernels have private futex support, the new code detects EINVAL from the futex syscall and falls back to making the call without the private flag. no attempt to cache the result is made; caching it and using the cached value efficiently is somewhat difficult, and not worth the complexity when the benefits would be seen only on ancient kernels which have numerous other limitations and bugs anyway.
# ac31bf27	10-Jun-2014	Rich Felker <dalias@aerifal.cx>	simplify errno implementation the motivation for the errno_ptr field in the thread structure, which this commit removes, was to allow the main thread's errno to keep its address when lazy thread pointer initialization was used. &errno was evaluated prior to setting up the thread pointer and stored in errno_ptr for the main thread; subsequently created threads would have errno_ptr pointing to their own errno_val in the thread structure. since lazy initialization was removed, there is no need for this extra level of indirection; __errno_location can simply return the address of the thread's errno_val directly. this does cause &errno to change, but the change happens before entry to application code, and thus is not observable.
# 7356c255	03-Aug-2013	Rich Felker <dalias@aerifal.cx>	fix multiple bugs in SIGEV_THREAD timers 1. the thread result field was reused for storing a kernel timer id, but would be overwritten if the application code exited or cancelled the thread. 2. low pointer values were used as the indicator that the timer id is a kernel timer id rather than a thread id. this is not portable, as mmap may return low pointers on some conditions. instead, use the fact that pointers must be aligned and kernel timer ids must be non-negative to map pointers into the negative integer space. 3. signals were not blocked until after the timer thread started, so a race condition could allow a signal handler to run in the timer thread when it's not supposed to exist. this is mainly problematic if the calling thread was the only thread where the signal was unblocked and the signal handler assumes it runs in that thread.
# 2c074b0d	26-Apr-2013	Rich Felker <dalias@aerifal.cx>	transition to using functions for internal signal blocking/restoring there are several reasons for this change. one is getting rid of the repetition of the syscall signature all over the place. another is sharing the constant masks without costly GOT accesses in PIC. the main motivation, however, is accurately representing whether we want to block signals that might be handled by the application, or all signals.
# 14a835b3	31-Mar-2013	Rich Felker <dalias@aerifal.cx>	implement pthread_getattr_np this function is mainly (purely?) for obtaining stack address information, but we also provide the detach state since it's easy to do anyway.
# ccc7b4c3	26-Mar-2013	Rich Felker <dalias@aerifal.cx>	remove __SYSCALL_SSLEN arch macro in favor of using public _NSIG the issue at hand is that many syscalls require as an argument the kernel-ABI size of sigset_t, intended to allow the kernel to switch to a larger sigset_t in the future. previously, each arch was defining this size in syscall_arch.h, which was redundant with the definition of _NSIG in bits/signal.h. as it's used in some not-quite-portable application code as well, _NSIG is much more likely to be recognized and understood immediately by someone reading the code, and it's also shorter and less cluttered. note that _NSIG is actually 65/129, not 64/128, but the division takes care of throwing away the off-by-one part.
# facc6acb	01-Feb-2013	Rich Felker <dalias@aerifal.cx>	replace __wake function with macro that performs direct syscall this should generate faster and smaller code, especially with inline syscalls. the conditional with cnt is ugly, but thankfully cnt is always a constant anyway so it gets evaluated at compile time. it may be preferable to make separate __wake and __wakeall macros without a count argument. priv flag is not used yet; private futex support still needs to be done at some point in the future.
# 1e21e78b	11-Nov-2012	Rich Felker <dalias@aerifal.cx>	add support for thread scheduling (POSIX TPS option) linux's sched_* syscalls actually implement the TPS (thread scheduling) functionality, not the PS (process scheduling) functionality which the sched_* functions are supposed to have. omitting support for the PS option (and having the sched_* interfaces fail with ENOSYS rather than omitting them, since some broken software assumes they exist) seems to be the only conforming way to do this on linux.
# efd4d87a	08-Nov-2012	Rich Felker <dalias@aerifal.cx>	clean up sloppy nested inclusion from pthread_impl.h this mirrors the stdio_impl.h cleanup. one header which is not strictly needed, errno.h, is left in pthread_impl.h, because since pthread functions return their error codes rather than using errno, nearly every single pthread function needs the errno constants. in a few places, rather than bringing in string.h to use memset, the memset was replaced by direct assignment. this seems to generate much better code anyway, and makes many functions which were previously non-leaf functions into leaf functions (possibly eliminating a great deal of bloat on some platforms where non-leaf functions require ugly prologue and/or epilogue).
# dcd60371	05-Oct-2012	Rich Felker <dalias@aerifal.cx>	support for TLS in dynamic-loaded (dlopen) modules unlike other implementations, this one reserves memory for new TLS in all pre-existing threads at dlopen-time, and dlopen will fail with no resources consumed and no new libraries loaded if memory is not available. memory is not immediately distributed to running threads; that would be too complex and too costly. instead, assurances are made that threads needing the new TLS can obtain it in an async-signal-safe way from a buffer belonging to the dynamic linker/new module (via atomic fetch-and-add based allocator). I've re-appropriated the lock that was previously used for __synccall (synchronizing set*id() syscalls between threads) as a general pthread_create lock. it's a "backwards" rwlock where the "read" operation is safe atomic modification of the live thread count, which multiple threads can perform at the same time, and the "write" operation is making sure the count does not increase during an operation that depends on it remaining bounded (__synccall or dlopen). in static-linked programs that don't use __synccall, this lock is a no-op and has no cost.
# 9b153c04	04-Oct-2012	Rich Felker <dalias@aerifal.cx>	beginnings of full TLS support in shared libraries this code will not work yet because the necessary relocations are not supported, and cannot be supported without some internal changes to how relocation processing works (coming soon).
# 2f437040	09-Aug-2012	Rich Felker <dalias@aerifal.cx>	fix (hopefully) all hard-coded 8's for kernel sigset_t size some minor changes to how hard-coded sets for thread-related purposes are handled were also needed, since the old object sizes were not necessarily sufficient. things have gotten a bit ugly in this area, and i think a cleanup is in order at some point, but for now the goal is just to get the code working on all supported archs including mips, which was badly broken by linux rejecting syscalls with the wrong sigset_t size.
# bbbe87e3	12-Jul-2012	Rich Felker <dalias@aerifal.cx>	fix several locks that weren't updated right for new futex-based __lock these could have caused memory corruption due to invalid accesses to the next field. all should be fixed now; I found the errors with fgrep -r '__lock(&', which is bogus since the argument should be an array.
# 819006a8	09-Jun-2012	Rich Felker <dalias@aerifal.cx>	add pthread_attr_setstack interface (and get) i originally omitted these (optional, per POSIX) interfaces because i considered them backwards implementation details. however, someone later brought to my attention a fairly legitimate use case: allocating thread stacks in memory that's setup for sharing and/or fast transfer between CPU and GPU so that the thread can move data to a GPU directly from automatic-storage buffers without having to go through additional buffer copies. perhaps there are other situations in which these interfaces are useful too.
# 13b3645c	02-Jun-2012	Rich Felker <dalias@aerifal.cx>	increase default thread stack size to 80k I've been looking for data that would suggest a good default, and since little has shown up, i'm doing this based on the limited data I have. the value 80k is chosen to accommodate 64k of application data (which happens to be the size of the buffer in git that made it crash without a patch to call pthread_attr_setstacksize) plus the max stack usage of most libc functions (with a few exceptions like crypt, which will be fixed soon to avoid excessive stack usage, and [n]ftw, which inherently uses a fair bit in recursive directory searching). if further evidence emerges suggesting that the default should be larger, I'll consider changing it again, but I'd like to avoid it getting too large to avoid the issues of large commit charge and rapid address space exhaustion on 32-bit machines.
# 7efd14ec	24-May-2012	Rich Felker <dalias@aerifal.cx>	remove cruft from pthread structure (old cancellation stuff)
# 9ae1cf6d	21-May-2012	Rich Felker <dalias@aerifal.cx>	fix out-of-bounds array access in pthread barriers on 64-bit it's ok to overlap with integer slot 3 on 32-bit because only slots 0-2 are used on process-local barriers.
# 58aa5f45	03-May-2012	Rich Felker <dalias@aerifal.cx>	overhaul SSP support to use a real canary pthread structure has been adjusted to match the glibc/GCC abi for where the canary is stored on i386 and x86_64. it will need variants for other archs to provide the added security of the canary's entropy, but even without that it still works as well as the old "minimal" ssp support. eventually such changes will be made anyway, since they are also needed for GCC/C11 thread-local storage support (not yet implemented). care is taken not to attempt initializing the thread pointer unless the program actually uses SSP (by reference to __stack_chk_fail).
# 5a2e1809	02-Oct-2011	Rich Felker <dalias@aerifal.cx>	synchronize cond var destruction with exiting waits
# 9cee9307	28-Sep-2011	Rich Felker <dalias@aerifal.cx>	improve pshared barriers eliminate the sequence number field and instead use the counter as the futex because of the way the lock is held, sequence numbers are completely useless, and this frees up a field in the barrier structure to be used as a waiter count for the count futex, which lets us avoid some syscalls in the best case. as of now, self-synchronized destruction and unmapping should be fully safe. before any thread can return from the barrier, all threads in the barrier have obtained the vm lock, and each holds a shared lock on the barrier. the barrier memory is not inspected after the shared lock count reaches 0, nor after the vm lock is released.
# 60164570	27-Sep-2011	Rich Felker <dalias@aerifal.cx>	process-shared barrier support, based on discussion with bdonlan this implementation is rather heavy-weight, but it's the first solution i've found that's actually correct. all waiters actually wait twice at the barrier so that they can synchronize exit, and they hold a "vm lock" that prevents changes to virtual memory mappings (and blocks pthread_barrier_destroy) until all waiters are finished inspecting the barrier. thus, it is safe for any thread to destroy and/or unmap the barrier's memory as soon as pthread_barrier_wait returns, without further synchronization.
# 1fa05210	25-Sep-2011	Rich Felker <dalias@aerifal.cx>	fix lost signals in cond vars due to moving waiters from the cond var to the mutex in bcast, these waiters upon wakeup would steal slots in the count from newer waiters that had not yet been signaled, preventing the signal function from taking any action. to solve the problem, we simply use two separate waiter counts, and so that the original "total" waiters count is undisturbed by broadcast and still available for signal.
# 729d6368	25-Sep-2011	Rich Felker <dalias@aerifal.cx>	redo cond vars again, use sequence numbers testing revealed that the old implementation, while correct, was giving way too many spurious wakeups due to races changing the value of the condition futex. in a test program with 5 threads receiving broadcast signals, the number of returns from pthread_cond_wait was roughly 3 times what it should have been (2 spurious wakeups for every legitimate wakeup). moreover, the magnitude of this effect seems to grow with the number of threads. the old implementation may also have had some nasty race conditions with reuse of the cond var with a new mutex. the new implementation is based on incrementing a sequence number with each signal event. this sequence number has nothing to do with the number of threads intended to be woken; it's only used to provide a value for the futex wait to avoid deadlock. in theory there is a danger of race conditions due to the value wrapping around after 2^32 signals. it would be nice to eliminate that, if there's a way. testing showed no spurious wakeups (though they are of course possible) with the new implementation, as well as slightly improved performance.
# cba4e1c0	25-Sep-2011	Rich Felker <dalias@aerifal.cx>	new futex-requeue-based pthread_cond_broadcast implementation this avoids the "stampede effect" where pthread_cond_broadcast would result in all waiters waking up simultaneously, only to immediately contend for the mutex and go back to sleep.
# 4b153ac4	22-Sep-2011	Rich Felker <dalias@aerifal.cx>	fix deadlock in condition wait whenever there are multiple waiters it's amazing none of the conformance tests i've run even bothered to check whether something so basic works...
# 3f72cdac	18-Sep-2011	Rich Felker <dalias@aerifal.cx>	overhaul clone syscall wrapping several things are changed. first, i have removed the old __uniclone function signature and replaced it with the "standard" linux __clone/clone signature. this was necessary to expose clone to applications anyway, and it makes it easier to port __clone to new archs, since it's now testable independently of pthread_create. secondly, i have removed all references to the ugly ldt descriptor structure (i386 only) from the c code and pthread structure. in places where it is needed, it is now created on the stack just when it's needed, in assembly code. thus, the i386 __clone function takes the desired thread pointer as its argument, rather than an ldt descriptor pointer, just like on all other sane archs. this should not affect applications since there is really no way an application can use clone with threads/tls in a way that doesn't horribly conflict with and clobber the underlying implementation's use. applications are expected to use clone only for creating actual processes, possibly with new namespace features and whatnot.
# 407d9330	12-Aug-2011	Rich Felker <dalias@aerifal.cx>	pthread and synccall cleanup, new __synccall_wait op fix up clone signature to match the actual behavior. the new __syncall_wait function allows a __synccall callback to wait for other threads to continue without returning, so that it can resume action after the caller finishes. this interface could be made significantly more general/powerful with minimal effort, but i'll wait to do that until it's actually useful for something.
# 50304f2e	03-Aug-2011	Rich Felker <dalias@aerifal.cx>	overhaul rwlocks to address several issues like mutexes and semaphores, rwlocks suffered from a race condition where the unlock operation could access the lock memory after another thread successfully obtained the lock (and possibly destroyed or unmapped the object). this has been fixed in the same way it was fixed for other lock types. in addition, the previous implementation favored writers over readers. in the absence of other considerations, that is the best behavior for rwlocks, and posix explicitly allows it. however posix also requires read locks to be recursive. if writers are favored, any attempt to obtain a read lock while a writer is waiting for the lock will fail, causing "recursive" read locks to deadlock. this can be avoided by keeping track of which threads already hold read locks, but doing so requires unbounded memory usage, and there must be a fallback case that favors readers in case memory allocation failed. and all of this must be synchronized. the cost, complexity, and risk of errors in getting it right is too great, so we simply favor readers. tracking of the owner of write locks has been removed, as it was not useful for anything. it could allow deadlock detection, but it's not clear to me that returning EDEADLK (which a buggy program is likely to ignore) is better than deadlocking; at least the latter behavior prevents further data corruption. a correct program cannot invoke this situation anyway. the reader count and write lock state, as well as the "last minute" waiter flag have all been combined into a single atomic lock. this means all state transitions for the lock are atomic compare-and-swap operations. this makes establishing correctness much easier and may improve performance. finally, some code duplication has been cleaned up. more is called for, especially the standard __timedwait idiom repeated in all locks.
# ec381af9	02-Aug-2011	Rich Felker <dalias@aerifal.cx>	unify and overhaul timed futex waits new features: - FUTEX_WAIT_BITSET op will be used for timed waits if available. this saves a call to clock_gettime. - error checking for the timespec struct is now inside __timedwait so it doesn't need to be duplicated everywhere. cond_timedwait still needs to duplicate it to avoid unlocking the mutex, though. - pushing and popping the cancellation handler is delegated to __timedwait, and cancellable/non-cancellable waits are unified.
# dba68bf9	30-Jul-2011	Rich Felker <dalias@aerifal.cx>	add proper fuxed-based locking for stdio previously, stdio used spinlocks, which would be unacceptable if we ever add support for thread priorities, and which yielded pathologically bad performance if an application attempted to use flockfile on a key file as a major/primary locking mechanism. i had held off on making this change for fear that it would hurt performance in the non-threaded case, but actually support for recursive locking had already inflicted that cost. by having the internal locking functions store a flag indicating whether they need to perform unlocking, rather than using the actual recursive lock counter, i was able to combine the conditionals at unlock time, eliminating any additional cost, and also avoid a nasty corner case where a huge number of calls to ftrylockfile could cause deadlock later at the point of internal locking. this commit also fixes some issues with usage of pthread_self conflicting with __attribute__((const)) which resulted in crashes with some compiler versions/optimizations, mainly in flockfile prior to pthread_create.
# acb04806	29-Jul-2011	Rich Felker <dalias@aerifal.cx>	new attempt at making setid() safe and robust changing credentials in a multi-threaded program is extremely difficult on linux because it requires synchronizing the change between all threads, which have their own thread-local credentials on the kernel side. this is further complicated by the fact that changing the real uid can fail due to exceeding RLIMIT_NPROC, making it possible that the syscall will succeed in some threads but fail in others. the old __rsyscall approach being replaced was robust in that it would report failure if any one thread failed, but in this case, the program would be left in an inconsistent state where individual threads might have different uid. (this was not as bad as glibc, which would sometimes even fail to report the failure entirely!) the new approach being committed refuses to change real user id when it cannot temporarily set the rlimit to infinity. this is completely POSIX conformant since POSIX does not require an implementation to allow real-user-id changes for non-privileged processes whatsoever. still, setting the real uid can fail due to memory allocation in the kernel, but this can only happen if there is not already a cached object for the target user. thus, we forcibly serialize the syscalls attempts, and fail the entire operation on the first failure. this should* lead to an all-or-nothing success/failure result, but it's still fragile and highly dependent on kernel developers not breaking things worse than they're already broken. ideally linux will eventually add a CLONE_USERCRED flag that would give POSIX conformant credential changes without any hacks from userspace, and all of this code would become redundant and could be removed ~10 years down the line when everyone has abandoned the old broken kernels. i'm not holding my breath...
# 7779dbd2	13-Jun-2011	Rich Felker <dalias@aerifal.cx>	fix race condition in pthread_kill if thread id was reused by the kernel between the time pthread_kill read it from the userspace pthread_t object and the time of the tgkill syscall, a signal could be sent to the wrong thread. the tgkill syscall was supposed to prevent this race (versus the old tkill syscall) but it can't; it can only help in the case where the tid is reused in a different process, but not when the tid is reused in the same process. the only solution i can see is an extra lock to prevent threads from exiting while another thread is trying to pthread_kill them. it should be very very cheap in the non-contended case.
# f09e78de	13-Jun-2011	Rich Felker <dalias@aerifal.cx>	fix sigset macro for 64-bit systems (<< was overflowing due to wrong type)
# 11c531e2	29-May-2011	Rich Felker <dalias@aerifal.cx>	implement uselocale function (minimal)
# 4c4e22d7	07-May-2011	Rich Felker <dalias@aerifal.cx>	optimize compound-literal sigset_t's not to contain useless hurd bits
# 99b8a25e	07-May-2011	Rich Felker <dalias@aerifal.cx>	overhaul implementation-internal signal protections the new approach relies on the fact that the only ways to create sigset_t objects without invoking UB are to use the sigset() functions, or from the masks returned by sigprocmask, sigaction, etc. or in the ucontext_t argument to a signal handler. thus, as long as sigfillset and sigaddset avoid adding the "protected" signals, there is no way the application will ever obtain a sigset_t including these bits, and thus no need to add the overhead of checking/clearing them when sigprocmask or sigaction is called. note that the old code actually failed* to remove the bits from sa_mask when sigaction was called. the new implementations are also significantly smaller, simpler, and faster due to ignoring the useless "GNU HURD signals" 65-1024, which are not used and, if there's any sanity in the world, never will be used.
# f16a3089	06-May-2011	Rich Felker <dalias@aerifal.cx>	completely new barrier implementation, addressing major correctness issues the previous implementation had at least 2 problems: 1. the case where additional threads reached the barrier before the first wave was finished leaving the barrier was untested and seemed not to be working. 2. threads leaving the barrier continued to access memory within the barrier object after other threads had successfully returned from pthread_barrier_wait. this could lead to memory corruption or crashes if the barrier object had automatic storage in one of the waiting threads and went out of scope before all threads finished returning, or if one thread unmapped the memory in which the barrier object lived. the new implementation avoids both problems by making the barrier state essentially local to the first thread which enters the barrier wait, and forces that thread to be the last to return.
# feee9890	17-Apr-2011	Rich Felker <dalias@aerifal.cx>	overhaul pthread cancellation this patch improves the correctness, simplicity, and size of cancellation-related code. modulo any small errors, it should now be completely conformant, safe, and resource-leak free. the notion of entering and exiting cancellation-point context has been completely eliminated and replaced with alternative syscall assembly code for cancellable syscalls. the assembly is responsible for setting up execution context information (stack pointer and address of the syscall instruction) which the cancellation signal handler can use to determine whether the interrupted code was in a cancellable state. these changes eliminate race conditions in the previous generation of cancellation handling code (whereby a cancellation request received just prior to the syscall would not be processed, leaving the syscall to block, potentially indefinitely), and remedy an issue where non-cancellable syscalls made from signal handlers became cancellable if the signal handler interrupted a cancellation point. x86_64 asm is untested and may need a second try to get it right.
# 016a5dc1	13-Apr-2011	Rich Felker <dalias@aerifal.cx>	use a separate signal from SIGCANCEL for SIGEV_THREAD timers otherwise we cannot support an application's desire to use asynchronous cancellation within the callback function. this change also slightly debloats pthread_create.c.
# 82171d6a	09-Apr-2011	Rich Felker <dalias@aerifal.cx>	greatly improve SIGEV_THREAD timers calling pthread_exit from, or pthread_cancel on, the timer callback thread will no longer destroy the timer.
# b2486a89	06-Apr-2011	Rich Felker <dalias@aerifal.cx>	move rsyscall out of pthread_create module this is something of a tradeoff, as now setid() functions, rather than pthread_create, are what pull in the code overhead for dealing with linux's refusal to implement proper POSIX thread-vs-process semantics. my motivations are: 1. it's cleaner this way, especially cleaner to optimize out the rsyscall locking overhead from pthread_create when it's not needed. 2. it's expected that only a tiny number of core system programs will ever use setid() functions, whereas many programs may want to use threads, and making thread overhead tiny is an incentive for "light" programs to try threads.
# b8be64c4	29-Mar-2011	Rich Felker <dalias@aerifal.cx>	optimize timer creation and possibly protect against some minor races the major idea of this patch is not to depend on having the timer pointer delivered to the signal handler, and instead use the thread pointer to get the callback function address and argument. this way, the parent thread can make the timer_create syscall while the child thread is starting, and it should never have to block waiting for the barrier.
# bf619d82	28-Mar-2011	Rich Felker <dalias@aerifal.cx>	major improvements to cancellation handling - there is no longer any risk of spoofing cancellation requests, since the cancel flag is set in pthread_cancel rather than in the signal handler. - cancellation signal is no longer unblocked when running the cancellation handlers. instead, pthread_create will cause any new threads created from a cancellation handler to unblock their own cancellation signal. - various tweaks in preparation for POSIX timer support.
# 70c31c7b	29-Mar-2011	Rich Felker <dalias@aerifal.cx>	some preliminaries for adding POSIX timers
# 83b6c9e0	28-Mar-2011	Rich Felker <dalias@aerifal.cx>	remove useless field in pthread struct (wasted a good bit of space)
# 047e434e	17-Mar-2011	Rich Felker <dalias@aerifal.cx>	implement robust mutexes some of this code should be cleaned up, e.g. using macros for some of the bit flags, masks, etc. nonetheless, the code is believed to be working and correct at this point.
# 93cc986a	17-Mar-2011	Rich Felker <dalias@aerifal.cx>	reorder mutex struct fields to make room for pointers (upcoming robust mutexes) the layout has been chosen so that pointer slots 3 and 4 fit between the integer slots on 32-bit archs, and come after the integer slots on 64-bit archs.
# b1c43161	16-Mar-2011	Rich Felker <dalias@aerifal.cx>	unify lock and owner fields of mutex structure this change is necessary to free up one slot in the mutex structure so that we can use doubly-linked lists in the implementation of robust mutexes.
# 5fcebcde	10-Mar-2011	Rich Felker <dalias@aerifal.cx>	optimize pthread termination in the non-detached case we can avoid blocking signals by simply using a flag to mark that the thread has exited and prevent it from getting counted in the rsyscall signal-pingpong. this restores the original pthread create/join throughput from before the sigprocmask call was added.
# 4820f926	08-Mar-2011	Rich Felker <dalias@aerifal.cx>	fix and optimize non-default-type mutex behavior problem 1: mutex type from the attribute was being ignored by pthread_mutex_init, so recursive/errorchecking mutexes were never being used at all. problem 2: ownership of recursive mutexes was not being enforced at unlock time.
# 5fd4a981	07-Mar-2011	Rich Felker <dalias@aerifal.cx>	use the selected clock from the condattr for pthread_cond_timedwait
# e8827563	17-Feb-2011	Rich Felker <dalias@aerifal.cx>	reorganize pthread data structures and move the definitions to alltypes.h this allows sys/types.h to provide the pthread types, as required by POSIX. this design also facilitates forcing ABI-compatible sizes in the arch-specific alltypes.h, while eliminating the need for developers changing the internals of the pthread types to poke around with arch-specific headers they may not be able to test.
# 7b2dd223	15-Feb-2011	Rich Felker <dalias@aerifal.cx>	finish unifying thread register handling in preparation for porting
# 0b2006c8	15-Feb-2011	Rich Felker <dalias@aerifal.cx>	begin unifying clone/thread management interface in preparation for porting
# 0b44a031	11-Feb-2011	Rich Felker <dalias@aerifal.cx>	initial check-in, version 0.5.0