Cross Reference: /freebsd-11-stable/sys/kern/kern

History log of /freebsd-11-stable/sys/kern/kern_rwlock.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 367457	07-Nov-2020	dim	MFC r344855 (by jhb): Drop "All rights reserved" from my copyright statements. Reviewed by: rgrimes Differential Revision: https://reviews.freebsd.org/D19485
# 341100	27-Nov-2018	vangyzen	MFC r340409 Make no assertions about lock state when the scheduler is stopped. Change the assert paths in rm, rw, and sx locks to match the lock and unlock paths. I did this for mutexes in r306346. Reported by: Travis Lane <tlane@isilon.com> Sponsored by: Dell EMC Isilon /freebsd-11-stable/sys/kern/kern_rmlock.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c
# 334437	31-May-2018	mjg	MFC r329276,r329451,r330294,r330414,r330415,r330418,r331109,r332394,r332398, r333831: rwlock: diff-reduction of runlock compared to sx sunlock == Undo LOCK_PROFILING pessimisation after r313454 and r313455 With the option used to compile the kernel both sx and rw shared ops would always go to the slow path which added avoidable overhead even when the facility is disabled. Furthermore the increased time spent doing uncontested shared lock acquire would be bogusly added to total wait time, somewhat skewing the results. Restore old behaviour of going there only when profiling is enabled. This change is a no-op for kernels without LOCK_PROFILING (which is the default). == sx: fix adaptive spinning broken in r327397 The condition was flipped. In particular heavy multithreaded kernel builds on zfs started suffering due to nested sx locks. For instance make -s -j 128 buildkernel: before: 3326.67s user 1269.62s system 6981% cpu 1:05.84 total after: 3365.55s user 911.27s system 6871% cpu 1:02.24 total == locks: fix a corner case in r327399 If there were exactly rowner_retries/asx_retries (by default: 10) transitions between read and write state and the waiters still did not get the lock, the next owner -> reader transition would result in the code correctly falling back to turnstile/sleepq where it would incorrectly think it was waiting for a writer and decide to leave turnstile/sleepq to loop back. From this point it would take ts/sq trips until the lock gets released. The bug sometimes manifested itself in stalls during -j 128 package builds. Refactor the code to fix the bug, while here remove some of the gratituous differences between rw and sx locks. == sx: don't do an atomic op in upgrade if it cananot succeed The code already pays the cost of reading the lock to obtain the waiters flag. Checking whether there is more than one reader is not a problem and avoids dirtying the line. This also fixes a small corner case: if waiters were to show up between reading the flag and upgrading the lock, the operation would fail even though it should not. No correctness change here though. == mtx: tidy up recursion handling in thread lock Normally after grabbing the lock it has to be verified we got the right one to begin with. However, if we are recursing, it must not change thus the check can be avoided. In particular this avoids a lock read for non-recursing case which found out the lock was changed. While here avoid an irq trip of this happens. == locks: slightly depessimize lockstat The slow path is always taken when lockstat is enabled. This induces rdtsc (or other) calls to get the cycle count even when there was no contention. Still go to the slow path to not mess with the fast path, but avoid the heavy lifting unless necessary. This reduces sys and real time during -j 80 buildkernel: before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total So note this is still a significant hit. LOCK_PROFILING results are not affected. == rw: whack avoidable re-reads in try_upgrade == locks: extend speculative spin waiting for readers to drain Now that 10 years have passed since the original limit of 10000 was committed, bump it a little bit. Spinning waiting for writers is semi-informed in the sense that we always know if the owner is running and base the decision to spin on that. However, no such information is provided for read-locking. In particular this means that it is possible for a write-spinner to completely waste cpu time waiting for the lock to be released, while the reader holding it was preempted and is now waiting for the spinner to go off cpu. Nonetheless, in majority of cases it is an improvement to spin instead of instantly giving up and going to sleep. The current approach is pretty simple: snatch the number of current readers and performs that many pauses before checking again. The total number of pauses to execute is limited to 10k. If the lock is still not free by that time, go to sleep. Given the previously noted problem of not knowing whether spinning makes any sense to begin with the new limit has to remain rather conservative. But at the very least it should also be related to the machine. Waiting for writers uses parameters selected based on the number of activated hardware threads. The upper limit of pause instructions to be executed in-between re-reads of the lock is typically 16384 or 32678. It was selected as the limit of total spins. The lower bound is set to already present 10000 as to not change it for smaller machines. Bumping the limit reduces system time by few % during benchmarks like buildworld, buildkernel and others. Tested on 2 and 4 socket machines (Broadwell, Skylake). Figuring out how to make a more informed decision while not pessimizing the fast path is left as an exercise for the reader. == fix uninitialized variable warning in reader locks Approved by: re (marius) /freebsd-11-stable/sys/kern/kern_mutex.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c /freebsd-11-stable/sys/sys/lockstat.h
# 329380	16-Feb-2018	mjg	MFC r327875,r327905,r327914: mtx: use fcmpset to cover setting MTX_CONTESTED === rwlock: try regular read unlock even in the hard path Saves on turnstile trips if the lock got more readers. === sx: retry hard shared unlock just like in r327905 for rwlocks /freebsd-11-stable/sys/kern/kern_mutex.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c
# 327478	02-Jan-2018	mjg	MFC r324335,r327393,r327397,r327401,r327402: locks: take the number of readers into account when waiting Previous code would always spin once before checking the lock. But a lock with e.g. 6 readers is not going to become free in the duration of once spin even if they start draining immediately. Conservatively perform one for each reader. Note that the total number of allowed spins is still extremely small and is subject to change later. ============= rwlock: tidy up __rw_runlock_hard similarly to r325921 ============= sx: read the SX_NOADAPTIVE flag and Giant ownership only once These used to be read multiple times when waiting for the lock the become free, which had the potential to issue completely avoidable traffic. ============= locks: re-check the reason to go to sleep after locking sleepq/turnstile In both rw and sx locks we always go to sleep if the lock owner is not running. We do spin for some time if the lock is read-locked. However, if we decide to go to sleep due to the lock owner being off cpu and after sleepq/turnstile gets acquired the lock is read-locked, we should fallback to the aforementioned wait. ============= sx: fix up non-smp compilation after r327397 ============= locks: adjust loop limit check when waiting for readers The check was for the exact value, but since the counter started being incremented by the number of readers it could have jumped over. ============= Return a non-NULL owner only if the lock is exclusively held in owner_sx(). Fix some whitespace bugs while here. /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c /freebsd-11-stable/sys/sys/lock.h
# 327413	31-Dec-2017	mjg	MFC r320561,r323236,r324041,r324314,r324609,r324613,r324778,r324780,r324787, r324803,r324836,r325469,r325706,r325917,r325918,r325919,r325920,r325921, r325922,r325925,r325963,r326106,r326107,r326110,r326111,r326112,r326194, r326195,r326196,r326197,r326198,r326199,r326200,r326237: rwlock: perform the typically false td_rw_rlocks check later Check if the lock is available first instead. ============= Sprinkle __read_frequently on few obvious places. Note that some of annotated variables should probably change their types to something smaller, preferably bit-sized. ============= mtx: drop the tid argument from _mtx_lock_sleep tid must be equal to curthread and the target routine was already reading it anyway, which is not a problem. Not passing it as a parameter allows for a little bit shorter code in callers. ============= locks: partially tidy up waiting on readers spin first instant of instantly re-readoing and don't re-read after spinning is finished - the state is already known. Note the code is subject to significant changes later. ============= locks: take the number of readers into account when waiting Previous code would always spin once before checking the lock. But a lock with e.g. 6 readers is not going to become free in the duration of once spin even if they start draining immediately. Conservatively perform one for each reader. Note that the total number of allowed spins is still extremely small and is subject to change later. ============= mtx: change MTX_UNOWNED from 4 to 0 The value is spread all over the kernel and zeroing a register is cheaper/shorter than setting it up to an arbitrary value. Reduces amd64 GENERIC-NODEBUG .text size by 0.4%. ============= mtx: fix up owner_mtx after r324609 Now that MTX_UNOWNED is 0 the test was alwayas false. ============= mtx: clean up locking spin mutexes 1) shorten the fast path by pushing the lockstat probe to the slow path 2) test for kernel panic only after it turns out we will have to spin, in particular test only after we know we are not recursing ============= mtx: stop testing SCHEDULER_STOPPED in kabi funcs for spin mutexes There is nothing panic-breaking to do in the unlock case and the lock case will fallback to the slow path doing the check already. ============= rwlock: reduce lockstat branches in the slowpath ============= mtx: fix up UP build after r324778 ============= mtx: implement thread lock fastpath ============= rwlock: fix up compilation without KDTRACE_HOOKS after r324787 ============= rwlock: use fcmpset for setting RW_LOCK_WRITE_SPINNER ============= sx: avoid branches if in the slow path if lockstat is disabled ============= rwlock: avoid branches in the slow path if lockstat is disabled ============= locks: pull up PMC_SOFT_CALLs out of slow path loops ============= mtx: unlock before traversing threads to wake up This shortens the lock hold time while not affecting corretness. All the woken up threads end up competing can lose the race against a completely unrelated thread getting the lock anyway. ============= rwlock: unlock before traversing threads to wake up While here perform a minor cleanup of the unlock path. ============= sx: perform a minor cleanup of the unlock slowpath No functional changes. ============= mtx: add missing parts of the diff in r325920 Fixes build breakage. ============= locks: fix compilation issues without SMP or KDTRACE_HOOKS ============= locks: remove the file + line argument from internal primitives when not used The pair is of use only in debug or LOCKPROF kernels, but was passed (zeroed) for many locks even in production kernels. While here whack the tid argument from wlock hard and xlock hard. There is no kbi change of any sort - "external" primitives still accept the pair. ============= locks: pass the found lock value to unlock slow path This avoids an explicit read later. While here whack the cheaply obtainable 'tid' argument. ============= rwlock: don't check for curthread's read lock count in the fast path ============= rwlock: unbreak WITNESS builds after r326110 ============= sx: unbreak debug after r326107 An assertion was modified to use the found value, but it was not updated to handle a race where blocked threads appear after the entrance to the func. Move the assertion down to the area protected with sleepq lock where the lock is read anyway. This does not affect coverage of the assertion and is consistent with what rw locks are doing. ============= rwlock: stop re-reading the owner when going to sleep ============= locks: retry turnstile/sleepq loops on failed cmpset In order to go to sleep threads set waiter flags, but that can spuriously fail e.g. when a new reader arrives. Instead of unlocking everything and looping back, re-evaluate the new state while still holding the lock necessary to go to sleep. ============= sx: change sunlock to wake waiters up if it locked sleepq sleepq is only locked if the curhtread is the last reader. By the time the lock gets acquired new ones could have arrived. The previous code would unlock and loop back. This results spurious relocking of sleepq. This is a step towards xadd-based unlock routine. ============= rwlock: add __rw_try_{r,w}lock_int ============= rwlock: fix up compilation of the previous change commmitted wrong version of the patch ============= Convert in-kernel thread_lock_flags calls to thread_lock when debug is disabled The flags argument is not used in this case. ============= Add the missing lockstat check for thread lock. ============= rw: fix runlock_hard when new readers show up When waiters/writer spinner flags are set no new readers can show up unless they already have a different rw rock read locked. The change in r326195 failed to take that into account - in presence of new readers it would spin until they all drain, which would be lead to trouble if e.g. they go off cpu and can get scheduled because of this thread. /freebsd-11-stable/sys/kern/kern_mutex.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c /freebsd-11-stable/sys/sys/lock.h /freebsd-11-stable/sys/sys/mutex.h /freebsd-11-stable/sys/sys/rwlock.h /freebsd-11-stable/sys/sys/sx.h
# 327409	31-Dec-2017	mjg	MFC r323235,r323236,r324789,r324863: Introduce __read_frequently While __read_mostly groups variables together, their placement is not specified. In particular 2 frequently used variables can end up in different lines. This annotation is only expected to be used for variables read all the time, e.g. on each syscall entry. ============= Sprinkle __read_frequently on few obvious places. Note that some of annotated variables should probably change their types to something smaller, preferably bit-sized. ============= Mark kdb_active as __read_frequently and switch to bool to eat less space. ============= Change kdb_active type to u_char. Fixes warnings from gcc and keeps the small size. Perhaps nesting should be moved to another variablle. /freebsd-11-stable/sys/conf/ldscript.amd64 /freebsd-11-stable/sys/kern/kern_dtrace.c /freebsd-11-stable/sys/kern/kern_lockstat.c /freebsd-11-stable/sys/kern/kern_mutex.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c /freebsd-11-stable/sys/kern/subr_kdb.c /freebsd-11-stable/sys/security/audit/audit.c /freebsd-11-stable/sys/sys/kdb.h /freebsd-11-stable/sys/sys/systm.h
# 326305	28-Nov-2017	markj	MFC r326060: Clean up the SYSINIT_FLAGS definitions for rwlock(9) and rmlock(9). /freebsd-11-stable/share/man/man9/Makefile /freebsd-11-stable/share/man/man9/rmlock.9 /freebsd-11-stable/share/man/man9/rwlock.9 /freebsd-11-stable/share/man/man9/sx.9 /freebsd-11-stable/sys/kern/kern_rmlock.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/sys/rmlock.h /freebsd-11-stable/sys/sys/rwlock.h
# 320241	22-Jun-2017	markj	MFC r320124: Fix the !TD_IS_IDLETHREAD(curthread) locking assertions. Approved by: re (kib) /freebsd-11-stable/sys/kern/kern_mutex.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c
# 315394	16-Mar-2017	mjg	MFC,r313855,r313865,r313875,r313877,r313878,r313901,r313908,r313928,r313944,r314185,r314476,r314187 locks: let primitives for modules unlock without always goging to the slsow path It is only needed if the LOCK_PROFILING is enabled. It has to always check if the lock is about to be released which requires an avoidable read if the option is not specified.. == sx: fix compilation on UP kernels after r313855 sx primitives use inlines as opposed to macros. Change the tested condition to LOCK_DEBUG which covers the case, but is slightly overzelaous. commit a39b839d16cd72b1df284ccfe6706fcdf362706e Author: mjg <mjg@ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f> Date: Sat Feb 18 22:06:03 2017 +0000 locks: clean up trylock primitives In particular thius reduces accesses of the lock itself. git-svn-id: svn+ssh://svn.freebsd.org/base/head@313928 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f commit 013560e742a5a276b0deef039bc18078d51d6eb0 Author: mjg <mjg@ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f> Date: Sat Feb 18 01:52:10 2017 +0000 mtx: plug the 'opts' argument when not used git-svn-id: svn+ssh://svn.freebsd.org/base/head@313908 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f commit 9a507901162fb476b9809da2919905735cd605af Author: mjg <mjg@ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f> Date: Fri Feb 17 22:09:55 2017 +0000 sx: fix mips builld after r313855 The namespace in this file really needs cleaning up. In the meantime let inline primitives be defined as long as LOCK_DEBUG is not enabled. Reported by: kib git-svn-id: svn+ssh://svn.freebsd.org/base/head@313901 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f commit aa6243a5124b9ceb3b1683ea4dbb0a133ce70095 Author: mjg <mjg@ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f> Date: Fri Feb 17 15:40:24 2017 +0000 mtx: get rid of file/line args from slow paths if they are unused This denotes changes which went in by accident in r313877. On most production kernels both said parameters are zeroed and have nothing reading them in either __mtx_lock_sleep or __mtx_unlock_sleep. Thus this change stops passing them by internal consumers which this is the case. Kernel modules use _flags variants which are not affected kbi-wise. git-svn-id: svn+ssh://svn.freebsd.org/base/head@313878 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f commit 688545a6af7ed0972653d6e2c6ca406ac511f39d Author: mjg <mjg@ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f> Date: Fri Feb 17 15:34:40 2017 +0000 mtx: restrict r313875 to kernels without LOCK_PROFILING git-svn-id: svn+ssh://svn.freebsd.org/base/head@313877 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f commit bbe6477138713da2d503f93cb5dd602e14152a08 Author: mjg <mjg@ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f> Date: Fri Feb 17 14:55:59 2017 +0000 mtx: microoptimize lockstat handling in __mtx_lock_sleep This saves a function call and multiple branches after the lock is acquired. overzelaous /freebsd-11-stable/sys/kern/kern_lockstat.c /freebsd-11-stable/sys/kern/kern_mutex.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c /freebsd-11-stable/sys/sys/lock.h /freebsd-11-stable/sys/sys/lockstat.h /freebsd-11-stable/sys/sys/mutex.h /freebsd-11-stable/sys/sys/sx.h
# 315393	16-Mar-2017	mjg	MFC r313472: The runlock slow path would update wrong variable before restarting the loop, in effect corrupting the state. Something was botched in the previous mfc attempt in r315380. /freebsd-11-stable/sys/kern/kern_rwlock.c
# 315386	16-Mar-2017	mjg	MFC r313853,r313859: locks: remove SCHEDULER_STOPPED checks from primitives for modules They all fallback to the slow path if necessary and the check is there. This means a panicked kernel executing code from modules will be able to succeed doing actual lock/unlock, but this was already the case for core code which has said primitives inlined. == Introduce SCHEDULER_STOPPED_TD for use when the thread pointer was already read Sprinkle in few places. /freebsd-11-stable/sys/kern/kern_condvar.c /freebsd-11-stable/sys/kern/kern_mutex.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c /freebsd-11-stable/sys/kern/kern_synch.c /freebsd-11-stable/sys/sys/systm.h
# 315382	16-Mar-2017	mjg	MFC r313467: locks: tidy up unlock fallback paths Update comments to note these functions are reachable if lockstat is enabled. Check if the lock has any bits set before attempting unlock, which saves an unnecessary atomic operation. /freebsd-11-stable/sys/kern/kern_mutex.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c
# 315380	16-Mar-2017	mjg	MFC r313454,r313472: rwlock: implemenet rlock/runlock fast path This improves singlethreaded throughput on my test machine from ~247 mln ops/s to ~328 mln. It is mostly about avoiding the setup cost of lockstat. == rwlock: fix r313454 The runlock slow path would update wrong variable before restarting the loop, in effect corrupting the state. /freebsd-11-stable/sys/kern/kern_rwlock.c
# 315379	16-Mar-2017	mjg	MFC r313392,r313784: rwlock: implement RW_LOCK_WRITER_RECURSED bit This moves recursion handling out of the inlined wunlock path and in particular saves a read and a branch. == rwlock: tidy up r313392 While a new bit was added and thread alignment got shifted to accomodate it, RW_READERS_SHIFT was not modified accordingly and clashed with the new flag. This was surprisingly harmless. If the lock was taken for writing, other flags were tested. If the lock was taken for reading, it would correctly work for readers > 1 and this was the only relevant test performed. /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/sys/rwlock.h
# 315378	16-Mar-2017	mjg	MFC r313275,r313280,r313282,r313335: mtx: move lockstat handling out of inline primitives Lockstat requires checking if it is enabled and if so, calling a 6 argument function. Further, determining whether to call it on unlock requires pre-reading the lock value. This is problematic in at least 3 ways: - more branches in the hot path than necessary - additional cacheline ping pong under contention - bigger code Instead, check first if lockstat handling is necessary and if so, just fall back to regular locking routines. For this purpose a new macro is introduced (LOCKSTAT_PROFILE_ENABLED). LOCK_PROFILING uninlines all primitives. Fold in the current inline lock variant into the _mtx_lock_flags to retain the support. With this change the inline variants are not used when LOCK_PROFILING is defined and thus can ignore its existence. This results in: text data bss dec hex filename 22259667 1303208 4994976 28557851 1b3c21b kernel.orig 21797315 1303208 4994976 28095499 1acb40b kernel.patched i.e. about 3% reduction in text size. A remaining action is to remove spurious arguments for internal kernel consumers. == sx: move lockstat handling out of inline primitives See r313275 for details. == rwlock: move lockstat handling out of inline primitives See r313275 for details. One difference here is that recursion handling was removed from the fallback routine. As it is it was never supposed to see a recursed lock in the first place. Future changes will move it out of inline variants, but right now there is no easy to way to test if the lock is recursed without reading additional words. == locks: fix recursion support after recent changes When a relevant lockstat probe is enabled the fallback primitive is called with a constant signifying a free lock. This works fine for typical cases but breaks with recursion, since it checks if the passed value is that of the executing thread. Read the value if necessary. /freebsd-11-stable/sys/kern/kern_mutex.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c /freebsd-11-stable/sys/sys/lockstat.h /freebsd-11-stable/sys/sys/mutex.h /freebsd-11-stable/sys/sys/rwlock.h /freebsd-11-stable/sys/sys/sdt.h /freebsd-11-stable/sys/sys/sx.h
# 315377	16-Mar-2017	mjg	MFC r313269,r313270,r313271,r313272,r313274,r313278,r313279,r313996,r314474 mtx: switch to fcmpset The found value is passed to locking routines in order to reduce cacheline accesses. mtx_unlock grows an explicit check for regular unlock. On ll/sc architectures the routine can fail even if the lock could have been handled by the inline primitive. == rwlock: switch to fcmpset == sx: switch to fcmpset == sx: uninline slock/sunlock Shared locking routines explicitly read the value and test it. If the change attempt fails, they fall back to a regular function which would retry in a loop. The problem is that with many concurrent readers the risk of failure is pretty high and even the value returned by fcmpset is very likely going to be stale by the time the loop in the fallback routine is reached. Uninline said primitives. It gives a throughput increase when doing concurrent slocks/sunlocks with 80 hardware threads from ~50 mln/s to ~56 mln/s. Interestingly, rwlock primitives are already not inlined. == sx: add witness support missed in r313272 == mtx: fix up _mtx_obtain_lock_fetch usage in thread lock Since _mtx_obtain_lock_fetch no longer sets the argument to MTX_UNOWNED, callers have to do it on their own. == mtx: fixup r313278, the assignemnt was supposed to go inside the loop == mtx: fix spin mutexes interaction with failed fcmpset While doing so move recursion support down to the fallback routine. == locks: ensure proper barriers are used with atomic ops when necessary Unclear how, but the locking routine for mutexes was using the release barrier instead of acquire. This must have been either a copy-pasto or bad completion. Going through other uses of atomics shows no barriers in: - upgrade routines (addressed in this patch) - sections protected with turnstile locks - this should be fine as necessary barriers are in the worst case provided by turnstile unlock I would like to thank Mark Millard and andreast@ for reporting the problem and testing previous patches before the issue got identified. /freebsd-11-stable/sys/kern/kern_mutex.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c /freebsd-11-stable/sys/sys/mutex.h /freebsd-11-stable/sys/sys/rwlock.h /freebsd-11-stable/sys/sys/sx.h
# 315341	16-Mar-2017	mjg	MFC r311172,r311194,r311226,r312389,r312390: mtx: reduce lock accesses Instead of spuriously re-reading the lock value, read it once. This change also has a side effect of fixing a performance bug: on failed _mtx_obtain_lock, it was possible that re-read would find the lock is unowned, but in this case the primitive would make a trip through turnstile code. This is diff reduction to a variant which uses atomic_fcmpset. == Reduce lock accesses in thread lock similarly to r311172 == mtx: plug open-coded mtx_lock access missed in r311172 == rwlock: reduce lock accesses similarly to r311172 == sx: reduce lock accesses similarly to r311172 /freebsd-11-stable/sys/kern/kern_mutex.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c /freebsd-11-stable/sys/sys/mutex.h /freebsd-11-stable/sys/sys/rwlock.h /freebsd-11-stable/sys/sys/sx.h
# 315339	16-Mar-2017	mjg	MFC r312890,r313386,r313390: Sprinkle __read_mostly on backoff and lock profiling code. == locks: change backoff to exponential Previous implementation would use a random factor to spread readers and reduce chances of starvation. This visibly reduces effectiveness of the mechanism. Switch to the more traditional exponential variant. Try to limit starvation by imposing an upper limit of spins after which spinning is half of what other threads get. Note the mechanism is turned off by default. == locks: follow up r313386 Unfinished diff was committed by accident. The loop in lock_delay was changed to decrement, but the loop iterator was still incrementing. /freebsd-11-stable/sys/kern/kern_lockstat.c /freebsd-11-stable/sys/kern/kern_mutex.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c /freebsd-11-stable/sys/kern/subr_lock.c /freebsd-11-stable/sys/sys/lock.h
# 303953	11-Aug-2016	mjg	MFC r303562,303563,r303584,r303643,r303652,r303655,r303707: rwlock: s/READER/WRITER/ in wlock lockstat annotation == sx: increment spin_cnt before cpu_spinwait in xlock The change is a no-op only done for consistency with the rest of the file. == locks: change sleep_cnt and spin_cnt types to u_int Both variables are uint64_t, but they only count spins or sleeps. All reasonable values which we can get here comfortably hit in 32-bit range. == Implement trivial backoff for locking primitives. All current spinning loops retry an atomic op the first chance they get, which leads to performance degradation under load. One classic solution to the problem consists of delaying the test to an extent. This implementation has a trivial linear increment and a random factor for each attempt. For simplicity, this first thouch implementation only modifies spinning loops where the lock owner is running. spin mutexes and thread lock were not modified. Current parameters are autotuned on boot based on mp_cpus. Autotune factors are very conservative and are subject to change later. == locks: fix up ifdef guards introduced in r303643 Both sx and rwlocks had copy-pasted ADAPTIVE_MUTEXES instead of the correct define. == locks: fix compilation for KDTRACE_HOOKS && !ADAPTIVE_* case == locks: fix sx compilation on mips after r303643 The kernel.h header is required for the SYSINIT macro, which apparently was present on amd64 by accident. Approved by: re (gjb) /freebsd-11-stable/sys/kern/kern_mutex.c /freebsd-11-stable/sys/kern/kern_rwlock.c /freebsd-11-stable/sys/kern/kern_sx.c /freebsd-11-stable/sys/kern/subr_lock.c /freebsd-11-stable/sys/sys/lock.h
# 302408	08-Jul-2016	gjb	Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here. Additional commits post-branch will follow. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-11-stable/MAINTAINERS /freebsd-11-stable/cddl /freebsd-11-stable/cddl/contrib/opensolaris /freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print /freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs /freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs /freebsd-11-stable/contrib/amd /freebsd-11-stable/contrib/apr /freebsd-11-stable/contrib/apr-util /freebsd-11-stable/contrib/atf /freebsd-11-stable/contrib/binutils /freebsd-11-stable/contrib/bmake /freebsd-11-stable/contrib/byacc /freebsd-11-stable/contrib/bzip2 /freebsd-11-stable/contrib/com_err /freebsd-11-stable/contrib/compiler-rt /freebsd-11-stable/contrib/dialog /freebsd-11-stable/contrib/dma /freebsd-11-stable/contrib/dtc /freebsd-11-stable/contrib/ee /freebsd-11-stable/contrib/elftoolchain /freebsd-11-stable/contrib/elftoolchain/ar /freebsd-11-stable/contrib/elftoolchain/brandelf /freebsd-11-stable/contrib/elftoolchain/elfdump /freebsd-11-stable/contrib/expat /freebsd-11-stable/contrib/file /freebsd-11-stable/contrib/gcc /freebsd-11-stable/contrib/gcclibs/libgomp /freebsd-11-stable/contrib/gdb /freebsd-11-stable/contrib/gdtoa /freebsd-11-stable/contrib/groff /freebsd-11-stable/contrib/ipfilter /freebsd-11-stable/contrib/ldns /freebsd-11-stable/contrib/ldns-host /freebsd-11-stable/contrib/less /freebsd-11-stable/contrib/libarchive /freebsd-11-stable/contrib/libarchive/cpio /freebsd-11-stable/contrib/libarchive/libarchive /freebsd-11-stable/contrib/libarchive/libarchive_fe /freebsd-11-stable/contrib/libarchive/tar /freebsd-11-stable/contrib/libc++ /freebsd-11-stable/contrib/libc-vis /freebsd-11-stable/contrib/libcxxrt /freebsd-11-stable/contrib/libexecinfo /freebsd-11-stable/contrib/libpcap /freebsd-11-stable/contrib/libstdc++ /freebsd-11-stable/contrib/libucl /freebsd-11-stable/contrib/libxo /freebsd-11-stable/contrib/llvm /freebsd-11-stable/contrib/llvm/projects/libunwind /freebsd-11-stable/contrib/llvm/tools/clang /freebsd-11-stable/contrib/llvm/tools/lldb /freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump /freebsd-11-stable/contrib/llvm/tools/llvm-lto /freebsd-11-stable/contrib/mdocml /freebsd-11-stable/contrib/mtree /freebsd-11-stable/contrib/ncurses /freebsd-11-stable/contrib/netcat /freebsd-11-stable/contrib/ntp /freebsd-11-stable/contrib/nvi /freebsd-11-stable/contrib/one-true-awk /freebsd-11-stable/contrib/openbsm /freebsd-11-stable/contrib/openpam /freebsd-11-stable/contrib/openresolv /freebsd-11-stable/contrib/pf /freebsd-11-stable/contrib/sendmail /freebsd-11-stable/contrib/serf /freebsd-11-stable/contrib/sqlite3 /freebsd-11-stable/contrib/subversion /freebsd-11-stable/contrib/tcpdump /freebsd-11-stable/contrib/tcsh /freebsd-11-stable/contrib/tnftp /freebsd-11-stable/contrib/top /freebsd-11-stable/contrib/top/install-sh /freebsd-11-stable/contrib/tzcode/stdtime /freebsd-11-stable/contrib/tzcode/zic /freebsd-11-stable/contrib/tzdata /freebsd-11-stable/contrib/unbound /freebsd-11-stable/contrib/vis /freebsd-11-stable/contrib/wpa /freebsd-11-stable/contrib/xz /freebsd-11-stable/crypto/heimdal /freebsd-11-stable/crypto/openssh /freebsd-11-stable/crypto/openssl /freebsd-11-stable/gnu/lib /freebsd-11-stable/gnu/usr.bin/binutils /freebsd-11-stable/gnu/usr.bin/cc/cc_tools /freebsd-11-stable/gnu/usr.bin/gdb /freebsd-11-stable/lib/libc/locale/ascii.c /freebsd-11-stable/sys/cddl/contrib/opensolaris /freebsd-11-stable/sys/contrib/dev/acpica /freebsd-11-stable/sys/contrib/ipfilter /freebsd-11-stable/sys/contrib/libfdt /freebsd-11-stable/sys/contrib/octeon-sdk /freebsd-11-stable/sys/contrib/x86emu /freebsd-11-stable/sys/contrib/xz-embedded /freebsd-11-stable/usr.sbin/bhyve/atkbdc.h /freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c /freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h /freebsd-11-stable/usr.sbin/bhyve/console.c /freebsd-11-stable/usr.sbin/bhyve/console.h /freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c /freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c /freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h /freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c /freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h /freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c /freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h /freebsd-11-stable/usr.sbin/bhyve/rfb.c /freebsd-11-stable/usr.sbin/bhyve/rfb.h /freebsd-11-stable/usr.sbin/bhyve/sockstream.c /freebsd-11-stable/usr.sbin/bhyve/sockstream.h /freebsd-11-stable/usr.sbin/bhyve/usb_emul.c /freebsd-11-stable/usr.sbin/bhyve/usb_emul.h /freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c /freebsd-11-stable/usr.sbin/bhyve/vga.c /freebsd-11-stable/usr.sbin/bhyve/vga.h
# 301157	01-Jun-2016	mjg	Microoptimize locking primitives by avoiding unnecessary atomic ops. Inline version of primitives do an atomic op and if it fails they fallback to actual primitives, which immediately retry the atomic op. The obvious optimisation is to check if the lock is free and only then proceed to do an atomic op. Reviewed by: jhb, vangyzen
# 286166	02-Aug-2015	markj	Don't modify curthread->td_locks unless INVARIANTS is enabled. This field is only used in a KASSERT that verifies that no locks are held when returning to user mode. Moreover, the td_locks accounting is only correct when LOCK_DEBUG > 0, which is implied by INVARIANTS. Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D3205
# 285706	19-Jul-2015	markj	Don't increment the spin count until after the first attempt to acquire a rwlock read lock. Otherwise the lockstat:::rw-spin probe will fire spuriously. MFC after: 1 week
# 285704	19-Jul-2015	markj	Consistently use a reader/writer flag for lockstat probes in rwlock(9) and sx(9), rather than using the probe function name to determine whether a given lock is a read lock or a write lock. Update lockstat(1) accordingly.
# 285703	19-Jul-2015	markj	Implement the lockstat provider using SDT(9) instead of the custom provider in lockstat.ko. This means that lockstat probes now have typed arguments and will utilize SDT probe hot-patching support when it arrives. Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D2993
# 285664	18-Jul-2015	markj	Pass the lock object to lockstat_nsecs() and return immediately if LO_NOPROFILE is set. Some timecounter handlers acquire a spin mutex, and we don't want to recurse if lockstat probes are enabled. PR: 201642 Reviewed by: avg MFC after: 3 days
# 284297	12-Jun-2015	avg	several lockstat improvements 0. For spin events report time spent spinning, not a loop count. While loop count is much easier and cheaper to obtain it is hard to reason about the reported numbers, espcially for adaptive locks where both spinning and sleeping can happen. So, it's better to compare apples and apples. 1. Teach lockstat about FreeBSD rw locks. This is done in part by changing the corresponding probes and in part by changing what probes lockstat should expect. 2. Teach lockstat that rw locks are adaptive and can spin on FreeBSD. 3. Report lock acquisition events for successful rw try-lock operations. 4. Teach lockstat about FreeBSD sx locks. Reporting of events for those locks completely mirrors rw locks. 5. Report spin and block events before acquisition event. This is behavior documented for the upstream, so it makes sense to stick to it. Note that because of FreeBSD adaptive lock implementations both the spin and block events may be reported for the same acquisition while the upstream reports only one of them. Differential Revision: https://reviews.freebsd.org/D2727 Reviewed by: markj MFC after: 17 days Relnotes: yes Sponsored by: ClusterHQ
# 275751	13-Dec-2014	dchagin	Add _NEW flag to mtx(9), sx(9), rmlock(9) and rwlock(9). A _NEW flag passed to _init_flags() to avoid check for double-init. Differential Revision: https://reviews.freebsd.org/D1208 Reviewed by: jhb, wblock MFC after: 1 Month
# 274092	04-Nov-2014	jhb	Add a new thread state "spinning" to schedgraph and add tracepoints at the start and stop of spinning waits in lock primitives.
# 261520	05-Feb-2014	jhb	Drop the 3rd clause from all 3 clause BSD licenses where I am the sole holder to convert them to 2 clause BSD licenses. MFC after: 1 week
# 259509	17-Dec-2013	attilio	- Assert for not leaking readers rw locks counter on userland return. - Use a correct spin_cnt for KDTRACE_HOOK case in rw read lock. Sponsored by: EMC / Isilon storage division
# 258541	25-Nov-2013	attilio	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip
# 255788	22-Sep-2013	davide	Consistently use the same value to indicate exclusively-held and shared-held locks for all the primitives in lc_lock/lc_unlock routines. This fixes the problems introduced in r255747, which indeed introduced an inversion in the logic. Reported by: many Tested by: bdrewery, pho, lme, Adam McDougall, O. Hartmann Approved by: re (glebius)
# 255745	20-Sep-2013	davide	Fix lc_lock/lc_unlock() support for rmlocks held in shared mode. With current lock classes KPI it was really difficult because there was no way to pass an rmtracker object to the lock/unlock routines. In order to accomplish the task, modify the aforementioned functions so that they can return (or pass as argument) an uinptr_t, which is in the rm case used to hold a pointer to struct rm_priotracker for current thread. As an added bonus, this fixes rm_sleep() in the rm shared case, which right now can communicate priotracker structure between lc_unlock()/lc_lock(). Suggested by: jhb Reviewed by: jhb Approved by: re (delphij)
# 252212	25-Jun-2013	jhb	A few mostly cosmetic nits to aid in debugging: - Call lock_init() first before setting any lock_object fields in lock init routines. This way if the machine panics due to a duplicate init the lock's original state is preserved. - Somewhat similarly, don't decrement td_locks and td_slocks until after an unlock operation has completed successfully.
# 251323	03-Jun-2013	jhb	- Handle the recursed/not recursed flags with RA_RLOCKED in rw_assert(). - Tweak a panic message.
# 244582	22-Dec-2012	attilio	Fixup r240424: On entering KDB backends, the hijacked thread to run interrupt context can still be idlethread. At that point, without the panic condition, it can still happen that idlethread then will try to acquire some locks to carry on some operations. Skip the idlethread check on block/sleep lock operations when KDB is active. Reported by: jh Tested by: jh MFC after: 1 week
# 242515	03-Nov-2012	attilio	Merge r242395,242483 from mutex implementation: give rwlock(9) the ability to crunch different type of structures, with the only constraint that they have a lock cookie named rw_lock. This name, then, becames reserved from the struct that wants to use the rwlock(9) KPI and other locking primitives cannot reuse it for their members. Namely such structs are the current struct rwlock and the new struct rwlock_padalign. The new structure will define an object which has the same layout of a struct rwlock but will be allocated in areas aligned to the cache line size and will be as big as a cache line. For further details check comments on above mentioned revisions. Reviewed by: jimharris, jeff
# 240475	13-Sep-2012	attilio	Remove all the checks on curthread != NULL with the exception of some MD trap checks (eg. printtrap()). Generally this check is not needed anymore, as there is not a legitimate case where curthread != NULL, after pcpu 0 area has been properly initialized. Reviewed by: bde, jhb MFC after: 1 week
# 240424	12-Sep-2012	attilio	Improve check coverage about idle threads. Idle threads are not allowed to acquire any lock but spinlocks. Deny any attempt to do so by panicing at the locking operation when INVARIANTS is on. Then, remove the check on blocking on a turnstile. The check in sleepqueues is left because they are not allowed to use tsleep() either which could happen still. Reviewed by: bde, jhb, kib MFC after: 1 week
# 233628	28-Mar-2012	fabient	Add software PMC support. New kernel events can be added at various location for sampling or counting. This will for example allow easy system profiling whatever the processor is with known tools like pmcstat(8). Simultaneous usage of software PMC and hardware PMC is possible, for example looking at the lock acquire failure, page fault while sampling on instructions. Sponsored by: NETASQ MFC after: 1 month
# 228424	11-Dec-2011	avg	panic: add a switch and infrastructure for stopping other CPUs in SMP case Historical behavior of letting other CPUs merily go on is a default for time being. The new behavior can be switched on via kern.stop_scheduler_on_panic tunable and sysctl. Stopping of the CPUs has (at least) the following benefits: - more of the system state at panic time is preserved intact - threads and interrupts do not interfere with dumping of the system state Only one thread runs uninterrupted after panic if stop_scheduler_on_panic is set. That thread might call code that is also used in normal context and that code might use locks to prevent concurrent execution of certain parts. Those locks might be held by the stopped threads and would never be released. To work around this issue, it was decided that instead of explicit checks for panic context, we would rather put those checks inside the locking primitives. This change has substantial portions written and re-written by attilio and kib at various times. Other changes are heavily based on the ideas and patches submitted by jhb and mdf. bde has provided many insights into the details and history of the current code. The new behavior may cause problems for systems that use a USB keyboard for interfacing with system console. This is because of some unusual locking patterns in the ukbd code which have to be used because on one hand ukbd is below syscons, but on the other hand it has to interface with other usb code that uses regular mutexes/Giant for its concurrency protection. Dumping to USB-connected disks may also be affected. PR: amd64/139614 (at least) In cooperation with: attilio, jhb, kib, mdf Discussed with: arch@, bde Tested by: Eugene Grosbein <eugen@grosbein.net>, gnn, Steven Hartland <killing@multiplay.co.uk>, glebius, Andrew Boyer <aboyer@averesystems.com> (various versions of the patch) MFC after: 3 months (or never)
# 227588	16-Nov-2011	pjd	Constify arguments for locking KPIs where possible. This enables locking consumers to pass their own structures around as const and be able to assert locks embedded into those structures. Reviewed by: ed, kib, jhb
# 227309	07-Nov-2011	ed	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
# 205626	24-Mar-2010	bz	Print the pointer to the lock with the panic message. The previous panic: rw lock not unlocked was not really helpful for debugging. Now one can at least call show lock <ptr> form ddb to learn more about the lock. MFC after: 3 days
# 197643	30-Sep-2009	attilio	When releasing a read/shared lock we need to use a write memory barrier in order to avoid, on architectures which doesn't have strong ordered writes, CPU instructions reordering. Diagnosed by: fabio Reviewed by: jhb Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
# 196334	17-Aug-2009	attilio	* Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check to a pointer-fetching specific operation check. Consequently, rename the operation ASSERT_ATOMIC_LOAD_PTR(). * Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking directly alignment on the word boundry, for all the given specific architectures. That's a bit too strict for some common case, but it assures safety. * Add a comment explaining the scope of the macro * Add a new stub in the lockmgr specific implementation Tested by: marcel (initial version), marius Reviewed by: rwatson, jhb (comment specific review) Approved by: re (kib)
# 196226	14-Aug-2009	bz	Add a new macro to test that a variable could be loaded atomically. Check that the given variable is at most uintptr_t in size and that it is aligned. Note: ASSERT_ATOMIC_LOAD() uses ALIGN() to check for adequate alignment -- however, the function of ALIGN() is to guarantee alignment, and therefore may lead to stronger alignment enforcement than necessary for types that are smaller than sizeof(uintptr_t). Add checks to mtx, rw and sx locks init functions to detect possible breakage. This was used during debugging of the problem fixed with r196118 where a pointer was on an un-aligned address in the dpcpu area. In collaboration with: rwatson Reviewed by: rwatson Approved by: re (kib)
# 193307	02-Jun-2009	attilio	Handle lock recursion differenty by always checking against LO_RECURSABLE instead the lock own flag itself. Tested by: pho
# 193037	29-May-2009	jhb	Remove extra cpu_spinwait() invocations. This should really only be used in tight spin loops, not in these edge cases where we restart a much larger loop only a few times. Reviewed by: attilio
# 193035	29-May-2009	jhb	Tweak a few comments on adaptive spinning.
# 192853	26-May-2009	sson	Add the OpenSolaris dtrace lockstat provider. The lockstat provider adds probes for mutexes, reader/writer and shared/exclusive locks to gather contention statistics and other locking information for dtrace scripts, the lockstat(1M) command and other potential consumers. Reviewed by: attilio jhb jb Approved by: gnn (mentor)
# 189846	15-Mar-2009	jeff	- Wrap lock profiling state variables in #ifdef LOCK_PROFILING blocks.
# 189074	26-Feb-2009	ed	Remove even more unneeded variable assignments. kern_time.c: - Unused variable `p'. kern_thr.c: - Variable `error' is always caught immediately, so no reason to initialize it. There is no way that error != 0 at the end of create_thread(). kern_sig.c: - Unused variable `code'. kern_synch.c: - `rval' is always assigned in all different cases. kern_rwlock.c: - `v' is always overwritten with RW_UNLOCKED further on. kern_malloc.c: - `size' is always initialized with the proper value before being used. kern_exit.c: - `error' is always caught and returned immediately. abort2() never returns a non-zero value. kern_exec.c: - `len' is always assigned inside the if-statement right below it. tty_info.c: - `td' is always overwritten by FOREACH_THREAD_IN_PROC(). Found by: LLVM's scan-build
# 185778	08-Dec-2008	kmacy	add RW_SYSINIT_FLAGS macro and rw_sysinit_flags initialization function
# 182914	10-Sep-2008	jhb	Teach WITNESS about the interlocks used with lockmgr. This removes a bunch of spurious witness warnings since lockmgr grew witness support. Before this, every time you passed an interlock to a lockmgr lock WITNESS treated it as a LOR. Reviewed by: attilio
# 182909	10-Sep-2008	jhb	Various whitespace fixes.
# 179334	27-May-2008	attilio	Improve a comment which, in the actual CVS stock, doesn't completely explain the logic of the code chunk.
# 177912	04-Apr-2008	jeff	- Add sysctls at debug.rwlock to control the behavior of the speculative spinning when readers hold a lock. This spinning is speculative because, unlike the write case, we can not test whether the owners are running. - Add speculative read spinning for readers who are blocked by pending writers while a read lock is still held. This allows the thread to spin until the write lock succeeds after which it may spin until the writer has released the lock. This prevents excessive context switches when readers and writers both hold the lock for brief periods. Sponsored by: Nokia
# 177843	01-Apr-2008	attilio	Add rw_try_rlock() and rw_try_wlock() to rwlocks. These functions try the specified operation (rlocking and wlocking) and true is returned if the operation completes, false otherwise. The KPI is enriched by this commit, so __FreeBSD_version bumping and manpage updating will happen soon. Requested by: jeff, kris
# 176076	07-Feb-2008	jeff	- In rw_wunlock_hard prefer to wakeup writers if there are both readers and writers available. Doing otherwise can cause deadlocks as no read locks can proceed while there are write waiters. Sponsored by: Nokia
# 176017	06-Feb-2008	jeff	Adaptive spinning in write path with readers and writer starvation avoidance. - Move recursion checking into rwlock inlines to free a bit for use with adaptive spinners. - Clear the RW_LOCK_WRITE_SPINNERS flag whenever the lock state changes causing write spinners to restart their loop. - Write spinners are limited by a count while readers hold the lock as there is no way to know for certain whether readers are running still. - In the read path block if there are write waiters or spinners to avoid starving writers. Use a new per-thread count, td_rw_rlocks, to skip starvation avoidance if it might cause a deadlock. - Remove or change invalid assertions in turnstiles. Reviewed by: attilio (developed parts of the patch as well) Sponsored by: Nokia
# 175411	17-Jan-2008	jhb	Remove a conditional that is always true. MFC after: 2 weeks
# 174629	15-Dec-2007	jeff	- Re-implement lock profiling in such a way that it no longer breaks the ABI when enabled. There is no longer an embedded lock_profile_object in each lock. Instead a list of lock_profile_objects is kept per-thread for each lock it may own. The cnt_hold statistic is now always 0 to facilitate this. - Support shared locking by tracking individual lock instances and statistics in the per-thread per-instance lock_profile_object. - Make the lock profiling hash table a per-cpu singly linked list with a per-cpu static lock_prof allocator. This removes the need for an array of spinlocks and reduces cache contention between cores. - Use a seperate hash for spinlocks and other locks so that only a critical_enter() is required and not a spinlock_enter() to modify the per-cpu tables. - Count time spent spinning in the lock statistics. - Remove the LOCK_PROFILE_SHARED option as it is always supported now. - Specifically drop and release the scheduler locks in both schedulers since we track owners now. In collaboration with: Kip Macy Sponsored by: Nokia
# 173960	26-Nov-2007	attilio	Simplify the adaptive spinning algorithm in rwlock and mutex: currently, before to spin the turnstile spinlock is acquired and the waiters flag is set. This is not strictly necessary, so just spin before to acquire the spinlock and to set the flags. This will simplify a lot other functions too, as now we have the waiters flag set only if there are actually waiters. This should make wakeup/sleeping couplet faster under intensive mutex workload. This also fixes a bug in rw_try_upgrade() in the adaptive case, where turnstile_lookup() will recurse on the ts_lock lock that will never be really released [1]. [1] Reported by: jeff with Nokia help Tested by: pho, kris (earlier, bugged version of rwlock part) Discussed with: jhb [2], jeff MFC after: 1 week [2] John had a similar patch about 6.x and/or 7.x about mutexes probabilly
# 173733	18-Nov-2007	attilio	Expand lock class with the "virtual" function lc_assert which will offer an unified way for all the lock primitives to express lock assertions. Currenty, lockmgrs and rmlocks don't have assertions, so just panic in that case. This will be a base for more callout improvements. Ok'ed by: jhb, jeff
# 173617	14-Nov-2007	attilio	Remove a bogus KASSERT which will prevent rwlock to be acquired recursively in exclusive mode with debugging kernels. Submitted by: kmacy Approved by: jeff
# 173600	14-Nov-2007	julian	generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
# 171516	20-Jul-2007	attilio	Fix some problems with lock profiling in rw locks: - Adjust lock_profiling stubs semantic in the hard functions in order to be more accurate and trustable - As for sx locks, disable shared paths for lock_profiling. Actually, lock_profiling has a subtle race which makes results caming from shared paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for re-enabling this paths, but is currently intended for developing use only. - style(9) fixes Approved by: jeff, kmacy, jhb[1] Approved by: re [1] Had initial reservations not shared by others, conceded in the end.
# 171052	26-Jun-2007	attilio	Introduce a new rwlocks initialization function: rw_init_flags. This is very similar to sx_init_flags: it initializes the rwlock using special flags passed as third argument (RW_DUPOK, RW_NOPROFILE, RW_NOWITNESS, RW_QUIET, RW_RECURSE). Among these, the most important new feature is probabilly that rwlocks can be acquired recursively now (for both shared and exclusive paths). Because of the recursion counter, the ABI is changed. Tested by: Timothy Redaelli <drizzt@gufi.org> Reviewed by: jhb Approved by: jeff (mentor) Approved by: re
# 170295	04-Jun-2007	jeff	Commit 3/14 of sched_lock decomposition. - Add a per-turnstile spinlock to solve potential priority propagation deadlocks that are possible with thread_lock(). - The turnstile lock order is defined as the exact opposite of the lock order used with the sleep locks they represent. This allows us to walk in reverse order in priority_propagate and this is the only place we wish to multiply acquire turnstile locks. - Use the turnstile_chain lock to protect assigning mutexes to turnstiles. - Change the turnstile interface to pass back turnstile pointers to the consumers. This allows us to reduce some locking and makes it easier to cancel turnstile assignment while the turnstile chain lock is held. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
# 169675	18-May-2007	jhb	Move lock_profile_object_{init,destroy}() into lock_{init,destroy}().
# 169394	08-May-2007	jhb	Add destroyed cookie values for sx locks and rwlocks as well as extra KASSERTs so that any lock operations on a destroyed lock will panic or hang.
# 168073	30-Mar-2007	jhb	- Drop memory barriers in rw_try_upgrade(). We don't need an 'acq' memory barrier here as the earlier rw_rlock() already contained one. - Comment fix.
# 167801	22-Mar-2007	jhb	- Simplify the #ifdef's for adaptive mutexes and rwlocks by conditionally defining a macro earlier in the file. - Add NO_ADAPTIVE_RWLOCKS option to disable adaptive spinning for rwlocks.
# 167787	21-Mar-2007	jhb	Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes, rwlocks, and sx locks to 'lock_object'.
# 167504	13-Mar-2007	jhb	Print readers count as unsigned in ddb 'show lock'. Submitted by: attilio
# 167492	12-Mar-2007	jhb	Fix a typo.
# 167368	09-Mar-2007	jhb	Add two new function pointers 'lc_lock' and 'lc_unlock' to lock classes. These functions are intended to be used to drop a lock and then reacquire it when doing an sleep such as msleep(9). Both functions accept a 'struct lock_object *' as their first parameter. The 'lc_unlock' function returns an integer that is then passed as the second paramter to the subsequent 'lc_lock' function. This can be used to communicate state. For example, sx locks and rwlocks use this to indicate if the lock was share/read locked vs exclusive/write locked. Currently, spin mutexes and lockmgr locks do not provide working lc_lock and lc_unlock functions.
# 167365	09-Mar-2007	jhb	Use C99-style struct member initialization for lock classes.
# 167307	07-Mar-2007	jhb	Fix some nits in lock profiling for rwlocks: - Properly note when a read lock is released. - Always note when we contest on a read lock. - Only note success of obtaining read locks for the first reader to match the behavior of sx(9). Reviewed by: kmacy
# 167054	27-Feb-2007	kmacy	Further improvements to LOCK_PROFILING: - Fix missing initialization in kern_rwlock.c causing bogus times to be collected - Move updates to the lock hash to after the lock is released for spin mutexes, sleep mutexes, and sx locks - Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling statistics when an acquisition is contended. This reduces the overhead of LOCK_PROFILING to increasing system time by 20%-25% which on "make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms of wall-clock time. Contrast this to enabling lock profiling without LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.
# 167024	26-Feb-2007	rwatson	Add rw_wowned() interface to rwlock(9), allowing a kernel thread to determine if it holds an exclusive rwlock reference or not. This is non-ideal, but recursion scenarios in the network stack currently require it. Approved by: jhb
# 167012	26-Feb-2007	kmacy	general LOCK_PROFILING cleanup - only collect timestamps when a lock is contested - this reduces the overhead of collecting profiles from 20x to 5x - remove unused function from subr_lock.c - generalize cnt_hold and cnt_lock statistics to be kept for all locks - NOTE: rwlock profiling generates invalid statistics (and most likely always has) someone familiar with that should review
# 164246	13-Nov-2006	kmacy	track lock class name in a way that doesn't break WITNESS
# 164159	11-Nov-2006	kmacy	MUTEX_PROFILING has been generalized to LOCK_PROFILING. We now profile wait (time waited to acquire) and hold times for all kernel locks. If the architecture has a system synchronized TSC, the profiling code will use that - thereby minimizing profiling overhead. Large chunks of profiling code have been moved out of line, the overhead measured on the T1 for when it is compiled in but not enabled is < 1%. Approved by: scottl (standing in for mentor rwatson) Reviewed by: des and jhb
# 160771	27-Jul-2006	jhb	Adjust td_locks for non-spin mutexes, rwlocks, and sx locks so that it is a count of all non-spin locks, not just lockmgr locks. This can give us a much cheaper way to see if we have any locks held (such as when returning to userland via userret()) without requiring WITNESS. MFC after: 1 week
# 157882	19-Apr-2006	jhb	Implement rw_try_upgrade() and rw_downgrade(). rw_try_upgrade() makes a single attempt at upgrading a read lock to a write lock, and rw_downgrade() converts curthread's write lock into a read lock.
# 157851	18-Apr-2006	wkoszek	'owner' is not used without SMP. Fix kernel build for such kernel configurations. Approved by: jhb
# 157846	18-Apr-2006	jhb	Adaptively spin before blocking on the turnstile if an rwlock is write locked. In general the adaptive spinning is similar to the same code for mutexes with some extra trickiness in rw_wunlock_hard(). Specifically, even though both wait bits might be set and we might have a turnstile with at least one waiting thread, there might not be any threads blocked on the queue we are not waking up (they might all be spinning), and we should only preserve the waiting flag for the queue we aren't waking up if there are in fact threads blocked on that queue. Secondly, there might not be any threads blocked on the queue we have chosen to waken threads from (there might only be threads blocked on the other queue and the threads for this queue are all spinning) in which case we disown the turnstile instead of doing a braodcast and unpend.
# 157826	17-Apr-2006	jhb	- Add a rw_wowner() macro that just returns the owner of a write lock and use it in places that only care about the write owner instead of rw_owner() as a baby step towards limited read-lock owner. - Tidy the code that sets the WAITER flag bits to not duplicate a test around the atomic operation and the KTR trace in both of the lock functions.
# 155162	01-Feb-2006	scottl	Fix another compile problem. If I find any more, this file is going in the Attic until it is properly fixed.
# 155061	30-Jan-2006	scottl	Regroup order of operations to better reflect what was probably intended. Submitted by: Peter Jeremy
# 155012	29-Jan-2006	scottl	Take a stab at making this compile when WITNESS is not defined. gcc can't figure out the order of operations at line 519, and neither can I, but this is my best guess. Also correct a number of typos and syntax errors.
# 154973	29-Jan-2006	mlaier	Unbreak on archs where %d doesn't print uintptr_t arithmetic.
# 154941	27-Jan-2006	jhb	Add a basic reader/writer lock implementation to the kernel. This implementation is by no means perfect as far as some of the algorithms that it uses and the fact that it is missing some functionality (try locks and upgrades/downgrades are not there yet), however it does seem to work in my local testing. There is more detail in the comments in the code, but the short version follows. A reader/writer lock is very much like a regular mutex: it cannot be held across a voluntary sleep; it can be acquired in an interrupt thread; if the lock is held by a writer then the priority of any threads that block on the lock will be lent to the owner; the simple case lock operations all are done in a single atomic op. It also shares some similiarities with sx locks: it supports reader/writer semantics (multiple readers, but single writers); readers are allowed to recurse, but writers are not. We can extend this implementation further by either improving algorithms or adding new functionality, but this should at least give us a base to work with now. Reviewed by: arch (in theory) Tested on: i386 (4 cpu box with a kernel module that used 4 threads that randomly chose between read locks and write locks that ran w/o panicing for over a day solid. It usually panic'd within a few seconds when there were bugs during testing. :) The kernel module source is available on request.)
# 341100	27-Nov-2018	vangyzen	MFC r340409 Make no assertions about lock state when the scheduler is stopped. Change the assert paths in rm, rw, and sx locks to match the lock and unlock paths. I did this for mutexes in r306346. Reported by: Travis Lane <tlane@isilon.com> Sponsored by: Dell EMC Isilon
# 334437	31-May-2018	mjg	MFC r329276,r329451,r330294,r330414,r330415,r330418,r331109,r332394,r332398, r333831: rwlock: diff-reduction of runlock compared to sx sunlock == Undo LOCK_PROFILING pessimisation after r313454 and r313455 With the option used to compile the kernel both sx and rw shared ops would always go to the slow path which added avoidable overhead even when the facility is disabled. Furthermore the increased time spent doing uncontested shared lock acquire would be bogusly added to total wait time, somewhat skewing the results. Restore old behaviour of going there only when profiling is enabled. This change is a no-op for kernels without LOCK_PROFILING (which is the default). == sx: fix adaptive spinning broken in r327397 The condition was flipped. In particular heavy multithreaded kernel builds on zfs started suffering due to nested sx locks. For instance make -s -j 128 buildkernel: before: 3326.67s user 1269.62s system 6981% cpu 1:05.84 total after: 3365.55s user 911.27s system 6871% cpu 1:02.24 total == locks: fix a corner case in r327399 If there were exactly rowner_retries/asx_retries (by default: 10) transitions between read and write state and the waiters still did not get the lock, the next owner -> reader transition would result in the code correctly falling back to turnstile/sleepq where it would incorrectly think it was waiting for a writer and decide to leave turnstile/sleepq to loop back. From this point it would take ts/sq trips until the lock gets released. The bug sometimes manifested itself in stalls during -j 128 package builds. Refactor the code to fix the bug, while here remove some of the gratituous differences between rw and sx locks. == sx: don't do an atomic op in upgrade if it cananot succeed The code already pays the cost of reading the lock to obtain the waiters flag. Checking whether there is more than one reader is not a problem and avoids dirtying the line. This also fixes a small corner case: if waiters were to show up between reading the flag and upgrading the lock, the operation would fail even though it should not. No correctness change here though. == mtx: tidy up recursion handling in thread lock Normally after grabbing the lock it has to be verified we got the right one to begin with. However, if we are recursing, it must not change thus the check can be avoided. In particular this avoids a lock read for non-recursing case which found out the lock was changed. While here avoid an irq trip of this happens. == locks: slightly depessimize lockstat The slow path is always taken when lockstat is enabled. This induces rdtsc (or other) calls to get the cycle count even when there was no contention. Still go to the slow path to not mess with the fast path, but avoid the heavy lifting unless necessary. This reduces sys and real time during -j 80 buildkernel: before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total So note this is still a significant hit. LOCK_PROFILING results are not affected. == rw: whack avoidable re-reads in try_upgrade == locks: extend speculative spin waiting for readers to drain Now that 10 years have passed since the original limit of 10000 was committed, bump it a little bit. Spinning waiting for writers is semi-informed in the sense that we always know if the owner is running and base the decision to spin on that. However, no such information is provided for read-locking. In particular this means that it is possible for a write-spinner to completely waste cpu time waiting for the lock to be released, while the reader holding it was preempted and is now waiting for the spinner to go off cpu. Nonetheless, in majority of cases it is an improvement to spin instead of instantly giving up and going to sleep. The current approach is pretty simple: snatch the number of current readers and performs that many pauses before checking again. The total number of pauses to execute is limited to 10k. If the lock is still not free by that time, go to sleep. Given the previously noted problem of not knowing whether spinning makes any sense to begin with the new limit has to remain rather conservative. But at the very least it should also be related to the machine. Waiting for writers uses parameters selected based on the number of activated hardware threads. The upper limit of pause instructions to be executed in-between re-reads of the lock is typically 16384 or 32678. It was selected as the limit of total spins. The lower bound is set to already present 10000 as to not change it for smaller machines. Bumping the limit reduces system time by few % during benchmarks like buildworld, buildkernel and others. Tested on 2 and 4 socket machines (Broadwell, Skylake). Figuring out how to make a more informed decision while not pessimizing the fast path is left as an exercise for the reader. == fix uninitialized variable warning in reader locks Approved by: re (marius)
# 329380	16-Feb-2018	mjg	MFC r327875,r327905,r327914: mtx: use fcmpset to cover setting MTX_CONTESTED === rwlock: try regular read unlock even in the hard path Saves on turnstile trips if the lock got more readers. === sx: retry hard shared unlock just like in r327905 for rwlocks
# 327478	02-Jan-2018	mjg	MFC r324335,r327393,r327397,r327401,r327402: locks: take the number of readers into account when waiting Previous code would always spin once before checking the lock. But a lock with e.g. 6 readers is not going to become free in the duration of once spin even if they start draining immediately. Conservatively perform one for each reader. Note that the total number of allowed spins is still extremely small and is subject to change later. ============= rwlock: tidy up __rw_runlock_hard similarly to r325921 ============= sx: read the SX_NOADAPTIVE flag and Giant ownership only once These used to be read multiple times when waiting for the lock the become free, which had the potential to issue completely avoidable traffic. ============= locks: re-check the reason to go to sleep after locking sleepq/turnstile In both rw and sx locks we always go to sleep if the lock owner is not running. We do spin for some time if the lock is read-locked. However, if we decide to go to sleep due to the lock owner being off cpu and after sleepq/turnstile gets acquired the lock is read-locked, we should fallback to the aforementioned wait. ============= sx: fix up non-smp compilation after r327397 ============= locks: adjust loop limit check when waiting for readers The check was for the exact value, but since the counter started being incremented by the number of readers it could have jumped over. ============= Return a non-NULL owner only if the lock is exclusively held in owner_sx(). Fix some whitespace bugs while here.
# 327413	31-Dec-2017	mjg	MFC r320561,r323236,r324041,r324314,r324609,r324613,r324778,r324780,r324787, r324803,r324836,r325469,r325706,r325917,r325918,r325919,r325920,r325921, r325922,r325925,r325963,r326106,r326107,r326110,r326111,r326112,r326194, r326195,r326196,r326197,r326198,r326199,r326200,r326237: rwlock: perform the typically false td_rw_rlocks check later Check if the lock is available first instead. ============= Sprinkle __read_frequently on few obvious places. Note that some of annotated variables should probably change their types to something smaller, preferably bit-sized. ============= mtx: drop the tid argument from _mtx_lock_sleep tid must be equal to curthread and the target routine was already reading it anyway, which is not a problem. Not passing it as a parameter allows for a little bit shorter code in callers. ============= locks: partially tidy up waiting on readers spin first instant of instantly re-readoing and don't re-read after spinning is finished - the state is already known. Note the code is subject to significant changes later. ============= locks: take the number of readers into account when waiting Previous code would always spin once before checking the lock. But a lock with e.g. 6 readers is not going to become free in the duration of once spin even if they start draining immediately. Conservatively perform one for each reader. Note that the total number of allowed spins is still extremely small and is subject to change later. ============= mtx: change MTX_UNOWNED from 4 to 0 The value is spread all over the kernel and zeroing a register is cheaper/shorter than setting it up to an arbitrary value. Reduces amd64 GENERIC-NODEBUG .text size by 0.4%. ============= mtx: fix up owner_mtx after r324609 Now that MTX_UNOWNED is 0 the test was alwayas false. ============= mtx: clean up locking spin mutexes 1) shorten the fast path by pushing the lockstat probe to the slow path 2) test for kernel panic only after it turns out we will have to spin, in particular test only after we know we are not recursing ============= mtx: stop testing SCHEDULER_STOPPED in kabi funcs for spin mutexes There is nothing panic-breaking to do in the unlock case and the lock case will fallback to the slow path doing the check already. ============= rwlock: reduce lockstat branches in the slowpath ============= mtx: fix up UP build after r324778 ============= mtx: implement thread lock fastpath ============= rwlock: fix up compilation without KDTRACE_HOOKS after r324787 ============= rwlock: use fcmpset for setting RW_LOCK_WRITE_SPINNER ============= sx: avoid branches if in the slow path if lockstat is disabled ============= rwlock: avoid branches in the slow path if lockstat is disabled ============= locks: pull up PMC_SOFT_CALLs out of slow path loops ============= mtx: unlock before traversing threads to wake up This shortens the lock hold time while not affecting corretness. All the woken up threads end up competing can lose the race against a completely unrelated thread getting the lock anyway. ============= rwlock: unlock before traversing threads to wake up While here perform a minor cleanup of the unlock path. ============= sx: perform a minor cleanup of the unlock slowpath No functional changes. ============= mtx: add missing parts of the diff in r325920 Fixes build breakage. ============= locks: fix compilation issues without SMP or KDTRACE_HOOKS ============= locks: remove the file + line argument from internal primitives when not used The pair is of use only in debug or LOCKPROF kernels, but was passed (zeroed) for many locks even in production kernels. While here whack the tid argument from wlock hard and xlock hard. There is no kbi change of any sort - "external" primitives still accept the pair. ============= locks: pass the found lock value to unlock slow path This avoids an explicit read later. While here whack the cheaply obtainable 'tid' argument. ============= rwlock: don't check for curthread's read lock count in the fast path ============= rwlock: unbreak WITNESS builds after r326110 ============= sx: unbreak debug after r326107 An assertion was modified to use the found value, but it was not updated to handle a race where blocked threads appear after the entrance to the func. Move the assertion down to the area protected with sleepq lock where the lock is read anyway. This does not affect coverage of the assertion and is consistent with what rw locks are doing. ============= rwlock: stop re-reading the owner when going to sleep ============= locks: retry turnstile/sleepq loops on failed cmpset In order to go to sleep threads set waiter flags, but that can spuriously fail e.g. when a new reader arrives. Instead of unlocking everything and looping back, re-evaluate the new state while still holding the lock necessary to go to sleep. ============= sx: change sunlock to wake waiters up if it locked sleepq sleepq is only locked if the curhtread is the last reader. By the time the lock gets acquired new ones could have arrived. The previous code would unlock and loop back. This results spurious relocking of sleepq. This is a step towards xadd-based unlock routine. ============= rwlock: add __rw_try_{r,w}lock_int ============= rwlock: fix up compilation of the previous change commmitted wrong version of the patch ============= Convert in-kernel thread_lock_flags calls to thread_lock when debug is disabled The flags argument is not used in this case. ============= Add the missing lockstat check for thread lock. ============= rw: fix runlock_hard when new readers show up When waiters/writer spinner flags are set no new readers can show up unless they already have a different rw rock read locked. The change in r326195 failed to take that into account - in presence of new readers it would spin until they all drain, which would be lead to trouble if e.g. they go off cpu and can get scheduled because of this thread.
# 327409	31-Dec-2017	mjg	MFC r323235,r323236,r324789,r324863: Introduce __read_frequently While __read_mostly groups variables together, their placement is not specified. In particular 2 frequently used variables can end up in different lines. This annotation is only expected to be used for variables read all the time, e.g. on each syscall entry. ============= Sprinkle __read_frequently on few obvious places. Note that some of annotated variables should probably change their types to something smaller, preferably bit-sized. ============= Mark kdb_active as __read_frequently and switch to bool to eat less space. ============= Change kdb_active type to u_char. Fixes warnings from gcc and keeps the small size. Perhaps nesting should be moved to another variablle.
# 326305	28-Nov-2017	markj	MFC r326060: Clean up the SYSINIT_FLAGS definitions for rwlock(9) and rmlock(9).
# 320241	22-Jun-2017	markj	MFC r320124: Fix the !TD_IS_IDLETHREAD(curthread) locking assertions. Approved by: re (kib)
# 315394	16-Mar-2017	mjg	MFC,r313855,r313865,r313875,r313877,r313878,r313901,r313908,r313928,r313944,r314185,r314476,r314187 locks: let primitives for modules unlock without always goging to the slsow path It is only needed if the LOCK_PROFILING is enabled. It has to always check if the lock is about to be released which requires an avoidable read if the option is not specified.. == sx: fix compilation on UP kernels after r313855 sx primitives use inlines as opposed to macros. Change the tested condition to LOCK_DEBUG which covers the case, but is slightly overzelaous. commit a39b839d16cd72b1df284ccfe6706fcdf362706e Author: mjg <mjg@ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f> Date: Sat Feb 18 22:06:03 2017 +0000 locks: clean up trylock primitives In particular thius reduces accesses of the lock itself. git-svn-id: svn+ssh://svn.freebsd.org/base/head@313928 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f commit 013560e742a5a276b0deef039bc18078d51d6eb0 Author: mjg <mjg@ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f> Date: Sat Feb 18 01:52:10 2017 +0000 mtx: plug the 'opts' argument when not used git-svn-id: svn+ssh://svn.freebsd.org/base/head@313908 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f commit 9a507901162fb476b9809da2919905735cd605af Author: mjg <mjg@ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f> Date: Fri Feb 17 22:09:55 2017 +0000 sx: fix mips builld after r313855 The namespace in this file really needs cleaning up. In the meantime let inline primitives be defined as long as LOCK_DEBUG is not enabled. Reported by: kib git-svn-id: svn+ssh://svn.freebsd.org/base/head@313901 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f commit aa6243a5124b9ceb3b1683ea4dbb0a133ce70095 Author: mjg <mjg@ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f> Date: Fri Feb 17 15:40:24 2017 +0000 mtx: get rid of file/line args from slow paths if they are unused This denotes changes which went in by accident in r313877. On most production kernels both said parameters are zeroed and have nothing reading them in either __mtx_lock_sleep or __mtx_unlock_sleep. Thus this change stops passing them by internal consumers which this is the case. Kernel modules use _flags variants which are not affected kbi-wise. git-svn-id: svn+ssh://svn.freebsd.org/base/head@313878 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f commit 688545a6af7ed0972653d6e2c6ca406ac511f39d Author: mjg <mjg@ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f> Date: Fri Feb 17 15:34:40 2017 +0000 mtx: restrict r313875 to kernels without LOCK_PROFILING git-svn-id: svn+ssh://svn.freebsd.org/base/head@313877 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f commit bbe6477138713da2d503f93cb5dd602e14152a08 Author: mjg <mjg@ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f> Date: Fri Feb 17 14:55:59 2017 +0000 mtx: microoptimize lockstat handling in __mtx_lock_sleep This saves a function call and multiple branches after the lock is acquired. overzelaous
# 315393	16-Mar-2017	mjg	MFC r313472: The runlock slow path would update wrong variable before restarting the loop, in effect corrupting the state. Something was botched in the previous mfc attempt in r315380.
# 315386	16-Mar-2017	mjg	MFC r313853,r313859: locks: remove SCHEDULER_STOPPED checks from primitives for modules They all fallback to the slow path if necessary and the check is there. This means a panicked kernel executing code from modules will be able to succeed doing actual lock/unlock, but this was already the case for core code which has said primitives inlined. == Introduce SCHEDULER_STOPPED_TD for use when the thread pointer was already read Sprinkle in few places.
# 315382	16-Mar-2017	mjg	MFC r313467: locks: tidy up unlock fallback paths Update comments to note these functions are reachable if lockstat is enabled. Check if the lock has any bits set before attempting unlock, which saves an unnecessary atomic operation.
# 315380	16-Mar-2017	mjg	MFC r313454,r313472: rwlock: implemenet rlock/runlock fast path This improves singlethreaded throughput on my test machine from ~247 mln ops/s to ~328 mln. It is mostly about avoiding the setup cost of lockstat. == rwlock: fix r313454 The runlock slow path would update wrong variable before restarting the loop, in effect corrupting the state.
# 315379	16-Mar-2017	mjg	MFC r313392,r313784: rwlock: implement RW_LOCK_WRITER_RECURSED bit This moves recursion handling out of the inlined wunlock path and in particular saves a read and a branch. == rwlock: tidy up r313392 While a new bit was added and thread alignment got shifted to accomodate it, RW_READERS_SHIFT was not modified accordingly and clashed with the new flag. This was surprisingly harmless. If the lock was taken for writing, other flags were tested. If the lock was taken for reading, it would correctly work for readers > 1 and this was the only relevant test performed.
# 315378	16-Mar-2017	mjg	MFC r313275,r313280,r313282,r313335: mtx: move lockstat handling out of inline primitives Lockstat requires checking if it is enabled and if so, calling a 6 argument function. Further, determining whether to call it on unlock requires pre-reading the lock value. This is problematic in at least 3 ways: - more branches in the hot path than necessary - additional cacheline ping pong under contention - bigger code Instead, check first if lockstat handling is necessary and if so, just fall back to regular locking routines. For this purpose a new macro is introduced (LOCKSTAT_PROFILE_ENABLED). LOCK_PROFILING uninlines all primitives. Fold in the current inline lock variant into the _mtx_lock_flags to retain the support. With this change the inline variants are not used when LOCK_PROFILING is defined and thus can ignore its existence. This results in: text data bss dec hex filename 22259667 1303208 4994976 28557851 1b3c21b kernel.orig 21797315 1303208 4994976 28095499 1acb40b kernel.patched i.e. about 3% reduction in text size. A remaining action is to remove spurious arguments for internal kernel consumers. == sx: move lockstat handling out of inline primitives See r313275 for details. == rwlock: move lockstat handling out of inline primitives See r313275 for details. One difference here is that recursion handling was removed from the fallback routine. As it is it was never supposed to see a recursed lock in the first place. Future changes will move it out of inline variants, but right now there is no easy to way to test if the lock is recursed without reading additional words. == locks: fix recursion support after recent changes When a relevant lockstat probe is enabled the fallback primitive is called with a constant signifying a free lock. This works fine for typical cases but breaks with recursion, since it checks if the passed value is that of the executing thread. Read the value if necessary.
# 315377	16-Mar-2017	mjg	MFC r313269,r313270,r313271,r313272,r313274,r313278,r313279,r313996,r314474 mtx: switch to fcmpset The found value is passed to locking routines in order to reduce cacheline accesses. mtx_unlock grows an explicit check for regular unlock. On ll/sc architectures the routine can fail even if the lock could have been handled by the inline primitive. == rwlock: switch to fcmpset == sx: switch to fcmpset == sx: uninline slock/sunlock Shared locking routines explicitly read the value and test it. If the change attempt fails, they fall back to a regular function which would retry in a loop. The problem is that with many concurrent readers the risk of failure is pretty high and even the value returned by fcmpset is very likely going to be stale by the time the loop in the fallback routine is reached. Uninline said primitives. It gives a throughput increase when doing concurrent slocks/sunlocks with 80 hardware threads from ~50 mln/s to ~56 mln/s. Interestingly, rwlock primitives are already not inlined. == sx: add witness support missed in r313272 == mtx: fix up _mtx_obtain_lock_fetch usage in thread lock Since _mtx_obtain_lock_fetch no longer sets the argument to MTX_UNOWNED, callers have to do it on their own. == mtx: fixup r313278, the assignemnt was supposed to go inside the loop == mtx: fix spin mutexes interaction with failed fcmpset While doing so move recursion support down to the fallback routine. == locks: ensure proper barriers are used with atomic ops when necessary Unclear how, but the locking routine for mutexes was using the release barrier instead of acquire. This must have been either a copy-pasto or bad completion. Going through other uses of atomics shows no barriers in: - upgrade routines (addressed in this patch) - sections protected with turnstile locks - this should be fine as necessary barriers are in the worst case provided by turnstile unlock I would like to thank Mark Millard and andreast@ for reporting the problem and testing previous patches before the issue got identified.
# 315341	16-Mar-2017	mjg	MFC r311172,r311194,r311226,r312389,r312390: mtx: reduce lock accesses Instead of spuriously re-reading the lock value, read it once. This change also has a side effect of fixing a performance bug: on failed _mtx_obtain_lock, it was possible that re-read would find the lock is unowned, but in this case the primitive would make a trip through turnstile code. This is diff reduction to a variant which uses atomic_fcmpset. == Reduce lock accesses in thread lock similarly to r311172 == mtx: plug open-coded mtx_lock access missed in r311172 == rwlock: reduce lock accesses similarly to r311172 == sx: reduce lock accesses similarly to r311172
# 315339	16-Mar-2017	mjg	MFC r312890,r313386,r313390: Sprinkle __read_mostly on backoff and lock profiling code. == locks: change backoff to exponential Previous implementation would use a random factor to spread readers and reduce chances of starvation. This visibly reduces effectiveness of the mechanism. Switch to the more traditional exponential variant. Try to limit starvation by imposing an upper limit of spins after which spinning is half of what other threads get. Note the mechanism is turned off by default. == locks: follow up r313386 Unfinished diff was committed by accident. The loop in lock_delay was changed to decrement, but the loop iterator was still incrementing.
# 303953	11-Aug-2016	mjg	MFC r303562,303563,r303584,r303643,r303652,r303655,r303707: rwlock: s/READER/WRITER/ in wlock lockstat annotation == sx: increment spin_cnt before cpu_spinwait in xlock The change is a no-op only done for consistency with the rest of the file. == locks: change sleep_cnt and spin_cnt types to u_int Both variables are uint64_t, but they only count spins or sleeps. All reasonable values which we can get here comfortably hit in 32-bit range. == Implement trivial backoff for locking primitives. All current spinning loops retry an atomic op the first chance they get, which leads to performance degradation under load. One classic solution to the problem consists of delaying the test to an extent. This implementation has a trivial linear increment and a random factor for each attempt. For simplicity, this first thouch implementation only modifies spinning loops where the lock owner is running. spin mutexes and thread lock were not modified. Current parameters are autotuned on boot based on mp_cpus. Autotune factors are very conservative and are subject to change later. == locks: fix up ifdef guards introduced in r303643 Both sx and rwlocks had copy-pasted ADAPTIVE_MUTEXES instead of the correct define. == locks: fix compilation for KDTRACE_HOOKS && !ADAPTIVE_* case == locks: fix sx compilation on mips after r303643 The kernel.h header is required for the SYSINIT macro, which apparently was present on amd64 by accident. Approved by: re (gjb)