Cross Reference: /freebsd-10.1-release/sys/kern/kern

History log of /freebsd-10.1-release/sys/kern/kern_rwlock.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 272461	02-Oct-2014	gjb	Copy stable/10@r272459 to releng/10.1 as part of the 10.1-RELEASE process. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-10.1-release
# 262192	18-Feb-2014	jhb	MFC 261517,261520: Convert the license on files where I am the sole copyright holder to 2 clause BSD licenses.
# 256281	10-Oct-2013	gjb	Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
# 255788	22-Sep-2013	davide	Consistently use the same value to indicate exclusively-held and shared-held locks for all the primitives in lc_lock/lc_unlock routines. This fixes the problems introduced in r255747, which indeed introduced an inversion in the logic. Reported by: many Tested by: bdrewery, pho, lme, Adam McDougall, O. Hartmann Approved by: re (glebius)
# 255745	20-Sep-2013	davide	Fix lc_lock/lc_unlock() support for rmlocks held in shared mode. With current lock classes KPI it was really difficult because there was no way to pass an rmtracker object to the lock/unlock routines. In order to accomplish the task, modify the aforementioned functions so that they can return (or pass as argument) an uinptr_t, which is in the rm case used to hold a pointer to struct rm_priotracker for current thread. As an added bonus, this fixes rm_sleep() in the rm shared case, which right now can communicate priotracker structure between lc_unlock()/lc_lock(). Suggested by: jhb Reviewed by: jhb Approved by: re (delphij)
# 252212	25-Jun-2013	jhb	A few mostly cosmetic nits to aid in debugging: - Call lock_init() first before setting any lock_object fields in lock init routines. This way if the machine panics due to a duplicate init the lock's original state is preserved. - Somewhat similarly, don't decrement td_locks and td_slocks until after an unlock operation has completed successfully.
# 251323	03-Jun-2013	jhb	- Handle the recursed/not recursed flags with RA_RLOCKED in rw_assert(). - Tweak a panic message.
# 244582	22-Dec-2012	attilio	Fixup r240424: On entering KDB backends, the hijacked thread to run interrupt context can still be idlethread. At that point, without the panic condition, it can still happen that idlethread then will try to acquire some locks to carry on some operations. Skip the idlethread check on block/sleep lock operations when KDB is active. Reported by: jh Tested by: jh MFC after: 1 week
# 242515	03-Nov-2012	attilio	Merge r242395,242483 from mutex implementation: give rwlock(9) the ability to crunch different type of structures, with the only constraint that they have a lock cookie named rw_lock. This name, then, becames reserved from the struct that wants to use the rwlock(9) KPI and other locking primitives cannot reuse it for their members. Namely such structs are the current struct rwlock and the new struct rwlock_padalign. The new structure will define an object which has the same layout of a struct rwlock but will be allocated in areas aligned to the cache line size and will be as big as a cache line. For further details check comments on above mentioned revisions. Reviewed by: jimharris, jeff
# 240475	13-Sep-2012	attilio	Remove all the checks on curthread != NULL with the exception of some MD trap checks (eg. printtrap()). Generally this check is not needed anymore, as there is not a legitimate case where curthread != NULL, after pcpu 0 area has been properly initialized. Reviewed by: bde, jhb MFC after: 1 week
# 240424	12-Sep-2012	attilio	Improve check coverage about idle threads. Idle threads are not allowed to acquire any lock but spinlocks. Deny any attempt to do so by panicing at the locking operation when INVARIANTS is on. Then, remove the check on blocking on a turnstile. The check in sleepqueues is left because they are not allowed to use tsleep() either which could happen still. Reviewed by: bde, jhb, kib MFC after: 1 week
# 233628	28-Mar-2012	fabient	Add software PMC support. New kernel events can be added at various location for sampling or counting. This will for example allow easy system profiling whatever the processor is with known tools like pmcstat(8). Simultaneous usage of software PMC and hardware PMC is possible, for example looking at the lock acquire failure, page fault while sampling on instructions. Sponsored by: NETASQ MFC after: 1 month
# 228424	11-Dec-2011	avg	panic: add a switch and infrastructure for stopping other CPUs in SMP case Historical behavior of letting other CPUs merily go on is a default for time being. The new behavior can be switched on via kern.stop_scheduler_on_panic tunable and sysctl. Stopping of the CPUs has (at least) the following benefits: - more of the system state at panic time is preserved intact - threads and interrupts do not interfere with dumping of the system state Only one thread runs uninterrupted after panic if stop_scheduler_on_panic is set. That thread might call code that is also used in normal context and that code might use locks to prevent concurrent execution of certain parts. Those locks might be held by the stopped threads and would never be released. To work around this issue, it was decided that instead of explicit checks for panic context, we would rather put those checks inside the locking primitives. This change has substantial portions written and re-written by attilio and kib at various times. Other changes are heavily based on the ideas and patches submitted by jhb and mdf. bde has provided many insights into the details and history of the current code. The new behavior may cause problems for systems that use a USB keyboard for interfacing with system console. This is because of some unusual locking patterns in the ukbd code which have to be used because on one hand ukbd is below syscons, but on the other hand it has to interface with other usb code that uses regular mutexes/Giant for its concurrency protection. Dumping to USB-connected disks may also be affected. PR: amd64/139614 (at least) In cooperation with: attilio, jhb, kib, mdf Discussed with: arch@, bde Tested by: Eugene Grosbein <eugen@grosbein.net>, gnn, Steven Hartland <killing@multiplay.co.uk>, glebius, Andrew Boyer <aboyer@averesystems.com> (various versions of the patch) MFC after: 3 months (or never)
# 227588	16-Nov-2011	pjd	Constify arguments for locking KPIs where possible. This enables locking consumers to pass their own structures around as const and be able to assert locks embedded into those structures. Reviewed by: ed, kib, jhb
# 227309	07-Nov-2011	ed	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
# 205626	24-Mar-2010	bz	Print the pointer to the lock with the panic message. The previous panic: rw lock not unlocked was not really helpful for debugging. Now one can at least call show lock <ptr> form ddb to learn more about the lock. MFC after: 3 days
# 197643	30-Sep-2009	attilio	When releasing a read/shared lock we need to use a write memory barrier in order to avoid, on architectures which doesn't have strong ordered writes, CPU instructions reordering. Diagnosed by: fabio Reviewed by: jhb Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
# 196334	17-Aug-2009	attilio	* Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check to a pointer-fetching specific operation check. Consequently, rename the operation ASSERT_ATOMIC_LOAD_PTR(). * Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking directly alignment on the word boundry, for all the given specific architectures. That's a bit too strict for some common case, but it assures safety. * Add a comment explaining the scope of the macro * Add a new stub in the lockmgr specific implementation Tested by: marcel (initial version), marius Reviewed by: rwatson, jhb (comment specific review) Approved by: re (kib)
# 196226	14-Aug-2009	bz	Add a new macro to test that a variable could be loaded atomically. Check that the given variable is at most uintptr_t in size and that it is aligned. Note: ASSERT_ATOMIC_LOAD() uses ALIGN() to check for adequate alignment -- however, the function of ALIGN() is to guarantee alignment, and therefore may lead to stronger alignment enforcement than necessary for types that are smaller than sizeof(uintptr_t). Add checks to mtx, rw and sx locks init functions to detect possible breakage. This was used during debugging of the problem fixed with r196118 where a pointer was on an un-aligned address in the dpcpu area. In collaboration with: rwatson Reviewed by: rwatson Approved by: re (kib)
# 193307	02-Jun-2009	attilio	Handle lock recursion differenty by always checking against LO_RECURSABLE instead the lock own flag itself. Tested by: pho
# 193037	29-May-2009	jhb	Remove extra cpu_spinwait() invocations. This should really only be used in tight spin loops, not in these edge cases where we restart a much larger loop only a few times. Reviewed by: attilio
# 193035	29-May-2009	jhb	Tweak a few comments on adaptive spinning.
# 192853	26-May-2009	sson	Add the OpenSolaris dtrace lockstat provider. The lockstat provider adds probes for mutexes, reader/writer and shared/exclusive locks to gather contention statistics and other locking information for dtrace scripts, the lockstat(1M) command and other potential consumers. Reviewed by: attilio jhb jb Approved by: gnn (mentor)
# 189846	15-Mar-2009	jeff	- Wrap lock profiling state variables in #ifdef LOCK_PROFILING blocks.
# 189074	26-Feb-2009	ed	Remove even more unneeded variable assignments. kern_time.c: - Unused variable `p'. kern_thr.c: - Variable `error' is always caught immediately, so no reason to initialize it. There is no way that error != 0 at the end of create_thread(). kern_sig.c: - Unused variable `code'. kern_synch.c: - `rval' is always assigned in all different cases. kern_rwlock.c: - `v' is always overwritten with RW_UNLOCKED further on. kern_malloc.c: - `size' is always initialized with the proper value before being used. kern_exit.c: - `error' is always caught and returned immediately. abort2() never returns a non-zero value. kern_exec.c: - `len' is always assigned inside the if-statement right below it. tty_info.c: - `td' is always overwritten by FOREACH_THREAD_IN_PROC(). Found by: LLVM's scan-build
# 185778	08-Dec-2008	kmacy	add RW_SYSINIT_FLAGS macro and rw_sysinit_flags initialization function
# 182914	10-Sep-2008	jhb	Teach WITNESS about the interlocks used with lockmgr. This removes a bunch of spurious witness warnings since lockmgr grew witness support. Before this, every time you passed an interlock to a lockmgr lock WITNESS treated it as a LOR. Reviewed by: attilio
# 182909	10-Sep-2008	jhb	Various whitespace fixes.
# 179334	26-May-2008	attilio	Improve a comment which, in the actual CVS stock, doesn't completely explain the logic of the code chunk.
# 177912	04-Apr-2008	jeff	- Add sysctls at debug.rwlock to control the behavior of the speculative spinning when readers hold a lock. This spinning is speculative because, unlike the write case, we can not test whether the owners are running. - Add speculative read spinning for readers who are blocked by pending writers while a read lock is still held. This allows the thread to spin until the write lock succeeds after which it may spin until the writer has released the lock. This prevents excessive context switches when readers and writers both hold the lock for brief periods. Sponsored by: Nokia
# 177843	01-Apr-2008	attilio	Add rw_try_rlock() and rw_try_wlock() to rwlocks. These functions try the specified operation (rlocking and wlocking) and true is returned if the operation completes, false otherwise. The KPI is enriched by this commit, so __FreeBSD_version bumping and manpage updating will happen soon. Requested by: jeff, kris
# 176076	07-Feb-2008	jeff	- In rw_wunlock_hard prefer to wakeup writers if there are both readers and writers available. Doing otherwise can cause deadlocks as no read locks can proceed while there are write waiters. Sponsored by: Nokia
# 176017	05-Feb-2008	jeff	Adaptive spinning in write path with readers and writer starvation avoidance. - Move recursion checking into rwlock inlines to free a bit for use with adaptive spinners. - Clear the RW_LOCK_WRITE_SPINNERS flag whenever the lock state changes causing write spinners to restart their loop. - Write spinners are limited by a count while readers hold the lock as there is no way to know for certain whether readers are running still. - In the read path block if there are write waiters or spinners to avoid starving writers. Use a new per-thread count, td_rw_rlocks, to skip starvation avoidance if it might cause a deadlock. - Remove or change invalid assertions in turnstiles. Reviewed by: attilio (developed parts of the patch as well) Sponsored by: Nokia
# 175411	17-Jan-2008	jhb	Remove a conditional that is always true. MFC after: 2 weeks
# 174629	15-Dec-2007	jeff	- Re-implement lock profiling in such a way that it no longer breaks the ABI when enabled. There is no longer an embedded lock_profile_object in each lock. Instead a list of lock_profile_objects is kept per-thread for each lock it may own. The cnt_hold statistic is now always 0 to facilitate this. - Support shared locking by tracking individual lock instances and statistics in the per-thread per-instance lock_profile_object. - Make the lock profiling hash table a per-cpu singly linked list with a per-cpu static lock_prof allocator. This removes the need for an array of spinlocks and reduces cache contention between cores. - Use a seperate hash for spinlocks and other locks so that only a critical_enter() is required and not a spinlock_enter() to modify the per-cpu tables. - Count time spent spinning in the lock statistics. - Remove the LOCK_PROFILE_SHARED option as it is always supported now. - Specifically drop and release the scheduler locks in both schedulers since we track owners now. In collaboration with: Kip Macy Sponsored by: Nokia
# 173960	26-Nov-2007	attilio	Simplify the adaptive spinning algorithm in rwlock and mutex: currently, before to spin the turnstile spinlock is acquired and the waiters flag is set. This is not strictly necessary, so just spin before to acquire the spinlock and to set the flags. This will simplify a lot other functions too, as now we have the waiters flag set only if there are actually waiters. This should make wakeup/sleeping couplet faster under intensive mutex workload. This also fixes a bug in rw_try_upgrade() in the adaptive case, where turnstile_lookup() will recurse on the ts_lock lock that will never be really released [1]. [1] Reported by: jeff with Nokia help Tested by: pho, kris (earlier, bugged version of rwlock part) Discussed with: jhb [2], jeff MFC after: 1 week [2] John had a similar patch about 6.x and/or 7.x about mutexes probabilly
# 173733	18-Nov-2007	attilio	Expand lock class with the "virtual" function lc_assert which will offer an unified way for all the lock primitives to express lock assertions. Currenty, lockmgrs and rmlocks don't have assertions, so just panic in that case. This will be a base for more callout improvements. Ok'ed by: jhb, jeff
# 173617	14-Nov-2007	attilio	Remove a bogus KASSERT which will prevent rwlock to be acquired recursively in exclusive mode with debugging kernels. Submitted by: kmacy Approved by: jeff
# 173600	14-Nov-2007	julian	generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
# 171516	20-Jul-2007	attilio	Fix some problems with lock profiling in rw locks: - Adjust lock_profiling stubs semantic in the hard functions in order to be more accurate and trustable - As for sx locks, disable shared paths for lock_profiling. Actually, lock_profiling has a subtle race which makes results caming from shared paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for re-enabling this paths, but is currently intended for developing use only. - style(9) fixes Approved by: jeff, kmacy, jhb[1] Approved by: re [1] Had initial reservations not shared by others, conceded in the end.
# 171052	26-Jun-2007	attilio	Introduce a new rwlocks initialization function: rw_init_flags. This is very similar to sx_init_flags: it initializes the rwlock using special flags passed as third argument (RW_DUPOK, RW_NOPROFILE, RW_NOWITNESS, RW_QUIET, RW_RECURSE). Among these, the most important new feature is probabilly that rwlocks can be acquired recursively now (for both shared and exclusive paths). Because of the recursion counter, the ABI is changed. Tested by: Timothy Redaelli <drizzt@gufi.org> Reviewed by: jhb Approved by: jeff (mentor) Approved by: re
# 170295	04-Jun-2007	jeff	Commit 3/14 of sched_lock decomposition. - Add a per-turnstile spinlock to solve potential priority propagation deadlocks that are possible with thread_lock(). - The turnstile lock order is defined as the exact opposite of the lock order used with the sleep locks they represent. This allows us to walk in reverse order in priority_propagate and this is the only place we wish to multiply acquire turnstile locks. - Use the turnstile_chain lock to protect assigning mutexes to turnstiles. - Change the turnstile interface to pass back turnstile pointers to the consumers. This allows us to reduce some locking and makes it easier to cancel turnstile assignment while the turnstile chain lock is held. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
# 169675	18-May-2007	jhb	Move lock_profile_object_{init,destroy}() into lock_{init,destroy}().
# 169394	08-May-2007	jhb	Add destroyed cookie values for sx locks and rwlocks as well as extra KASSERTs so that any lock operations on a destroyed lock will panic or hang.
# 168073	30-Mar-2007	jhb	- Drop memory barriers in rw_try_upgrade(). We don't need an 'acq' memory barrier here as the earlier rw_rlock() already contained one. - Comment fix.
# 167801	22-Mar-2007	jhb	- Simplify the #ifdef's for adaptive mutexes and rwlocks by conditionally defining a macro earlier in the file. - Add NO_ADAPTIVE_RWLOCKS option to disable adaptive spinning for rwlocks.
# 167787	21-Mar-2007	jhb	Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes, rwlocks, and sx locks to 'lock_object'.
# 167504	13-Mar-2007	jhb	Print readers count as unsigned in ddb 'show lock'. Submitted by: attilio
# 167492	12-Mar-2007	jhb	Fix a typo.
# 167368	09-Mar-2007	jhb	Add two new function pointers 'lc_lock' and 'lc_unlock' to lock classes. These functions are intended to be used to drop a lock and then reacquire it when doing an sleep such as msleep(9). Both functions accept a 'struct lock_object *' as their first parameter. The 'lc_unlock' function returns an integer that is then passed as the second paramter to the subsequent 'lc_lock' function. This can be used to communicate state. For example, sx locks and rwlocks use this to indicate if the lock was share/read locked vs exclusive/write locked. Currently, spin mutexes and lockmgr locks do not provide working lc_lock and lc_unlock functions.
# 167365	09-Mar-2007	jhb	Use C99-style struct member initialization for lock classes.
# 167307	07-Mar-2007	jhb	Fix some nits in lock profiling for rwlocks: - Properly note when a read lock is released. - Always note when we contest on a read lock. - Only note success of obtaining read locks for the first reader to match the behavior of sx(9). Reviewed by: kmacy
# 167054	27-Feb-2007	kmacy	Further improvements to LOCK_PROFILING: - Fix missing initialization in kern_rwlock.c causing bogus times to be collected - Move updates to the lock hash to after the lock is released for spin mutexes, sleep mutexes, and sx locks - Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling statistics when an acquisition is contended. This reduces the overhead of LOCK_PROFILING to increasing system time by 20%-25% which on "make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms of wall-clock time. Contrast this to enabling lock profiling without LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.
# 167024	26-Feb-2007	rwatson	Add rw_wowned() interface to rwlock(9), allowing a kernel thread to determine if it holds an exclusive rwlock reference or not. This is non-ideal, but recursion scenarios in the network stack currently require it. Approved by: jhb
# 167012	26-Feb-2007	kmacy	general LOCK_PROFILING cleanup - only collect timestamps when a lock is contested - this reduces the overhead of collecting profiles from 20x to 5x - remove unused function from subr_lock.c - generalize cnt_hold and cnt_lock statistics to be kept for all locks - NOTE: rwlock profiling generates invalid statistics (and most likely always has) someone familiar with that should review
# 164246	13-Nov-2006	kmacy	track lock class name in a way that doesn't break WITNESS
# 164159	11-Nov-2006	kmacy	MUTEX_PROFILING has been generalized to LOCK_PROFILING. We now profile wait (time waited to acquire) and hold times for all kernel locks. If the architecture has a system synchronized TSC, the profiling code will use that - thereby minimizing profiling overhead. Large chunks of profiling code have been moved out of line, the overhead measured on the T1 for when it is compiled in but not enabled is < 1%. Approved by: scottl (standing in for mentor rwatson) Reviewed by: des and jhb
# 160771	27-Jul-2006	jhb	Adjust td_locks for non-spin mutexes, rwlocks, and sx locks so that it is a count of all non-spin locks, not just lockmgr locks. This can give us a much cheaper way to see if we have any locks held (such as when returning to userland via userret()) without requiring WITNESS. MFC after: 1 week
# 157882	19-Apr-2006	jhb	Implement rw_try_upgrade() and rw_downgrade(). rw_try_upgrade() makes a single attempt at upgrading a read lock to a write lock, and rw_downgrade() converts curthread's write lock into a read lock.
# 157851	18-Apr-2006	wkoszek	'owner' is not used without SMP. Fix kernel build for such kernel configurations. Approved by: jhb
# 157846	18-Apr-2006	jhb	Adaptively spin before blocking on the turnstile if an rwlock is write locked. In general the adaptive spinning is similar to the same code for mutexes with some extra trickiness in rw_wunlock_hard(). Specifically, even though both wait bits might be set and we might have a turnstile with at least one waiting thread, there might not be any threads blocked on the queue we are not waking up (they might all be spinning), and we should only preserve the waiting flag for the queue we aren't waking up if there are in fact threads blocked on that queue. Secondly, there might not be any threads blocked on the queue we have chosen to waken threads from (there might only be threads blocked on the other queue and the threads for this queue are all spinning) in which case we disown the turnstile instead of doing a braodcast and unpend.
# 157826	17-Apr-2006	jhb	- Add a rw_wowner() macro that just returns the owner of a write lock and use it in places that only care about the write owner instead of rw_owner() as a baby step towards limited read-lock owner. - Tidy the code that sets the WAITER flag bits to not duplicate a test around the atomic operation and the KTR trace in both of the lock functions.
# 155162	01-Feb-2006	scottl	Fix another compile problem. If I find any more, this file is going in the Attic until it is properly fixed.
# 155061	30-Jan-2006	scottl	Regroup order of operations to better reflect what was probably intended. Submitted by: Peter Jeremy
# 155012	29-Jan-2006	scottl	Take a stab at making this compile when WITNESS is not defined. gcc can't figure out the order of operations at line 519, and neither can I, but this is my best guess. Also correct a number of typos and syntax errors.
# 154973	29-Jan-2006	mlaier	Unbreak on archs where %d doesn't print uintptr_t arithmetic.
# 154941	27-Jan-2006	jhb	Add a basic reader/writer lock implementation to the kernel. This implementation is by no means perfect as far as some of the algorithms that it uses and the fact that it is missing some functionality (try locks and upgrades/downgrades are not there yet), however it does seem to work in my local testing. There is more detail in the comments in the code, but the short version follows. A reader/writer lock is very much like a regular mutex: it cannot be held across a voluntary sleep; it can be acquired in an interrupt thread; if the lock is held by a writer then the priority of any threads that block on the lock will be lent to the owner; the simple case lock operations all are done in a single atomic op. It also shares some similiarities with sx locks: it supports reader/writer semantics (multiple readers, but single writers); readers are allowed to recurse, but writers are not. We can extend this implementation further by either improving algorithms or adding new functionality, but this should at least give us a base to work with now. Reviewed by: arch (in theory) Tested on: i386 (4 cpu box with a kernel module that used 4 threads that randomly chose between read locks and write locks that ran w/o panicing for over a day solid. It usually panic'd within a few seconds when there were bugs during testing. :) The kernel module source is available on request.)