Cross Reference: /freebsd-11-stable/sys/kern/kern

History log of /freebsd-11-stable/sys/kern/kern_lock.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 315375	16-Mar-2017	mjg	MFC r313683: lockmgr: implement fast path The main lockmgr routine takes 8 arguments which makes it impossible to tail-call it by the intermediate vop_stdlock/unlock routines. The routine itself starts with an if-forest and reads from the lock itself several times. This slows things down both single- and multi-threaded. With the patch single-threaded fstats go 4% up and multithreaded up to ~27%. Note that there is still a lot of room for improvement.
# 302408	07-Jul-2016	gjb	Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here. Additional commits post-branch will follow. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-11-stable/MAINTAINERS /freebsd-11-stable/cddl /freebsd-11-stable/cddl/contrib/opensolaris /freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print /freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs /freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs /freebsd-11-stable/contrib/amd /freebsd-11-stable/contrib/apr /freebsd-11-stable/contrib/apr-util /freebsd-11-stable/contrib/atf /freebsd-11-stable/contrib/binutils /freebsd-11-stable/contrib/bmake /freebsd-11-stable/contrib/byacc /freebsd-11-stable/contrib/bzip2 /freebsd-11-stable/contrib/com_err /freebsd-11-stable/contrib/compiler-rt /freebsd-11-stable/contrib/dialog /freebsd-11-stable/contrib/dma /freebsd-11-stable/contrib/dtc /freebsd-11-stable/contrib/ee /freebsd-11-stable/contrib/elftoolchain /freebsd-11-stable/contrib/elftoolchain/ar /freebsd-11-stable/contrib/elftoolchain/brandelf /freebsd-11-stable/contrib/elftoolchain/elfdump /freebsd-11-stable/contrib/expat /freebsd-11-stable/contrib/file /freebsd-11-stable/contrib/gcc /freebsd-11-stable/contrib/gcclibs/libgomp /freebsd-11-stable/contrib/gdb /freebsd-11-stable/contrib/gdtoa /freebsd-11-stable/contrib/groff /freebsd-11-stable/contrib/ipfilter /freebsd-11-stable/contrib/ldns /freebsd-11-stable/contrib/ldns-host /freebsd-11-stable/contrib/less /freebsd-11-stable/contrib/libarchive /freebsd-11-stable/contrib/libarchive/cpio /freebsd-11-stable/contrib/libarchive/libarchive /freebsd-11-stable/contrib/libarchive/libarchive_fe /freebsd-11-stable/contrib/libarchive/tar /freebsd-11-stable/contrib/libc++ /freebsd-11-stable/contrib/libc-vis /freebsd-11-stable/contrib/libcxxrt /freebsd-11-stable/contrib/libexecinfo /freebsd-11-stable/contrib/libpcap /freebsd-11-stable/contrib/libstdc++ /freebsd-11-stable/contrib/libucl /freebsd-11-stable/contrib/libxo /freebsd-11-stable/contrib/llvm /freebsd-11-stable/contrib/llvm/projects/libunwind /freebsd-11-stable/contrib/llvm/tools/clang /freebsd-11-stable/contrib/llvm/tools/lldb /freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump /freebsd-11-stable/contrib/llvm/tools/llvm-lto /freebsd-11-stable/contrib/mdocml /freebsd-11-stable/contrib/mtree /freebsd-11-stable/contrib/ncurses /freebsd-11-stable/contrib/netcat /freebsd-11-stable/contrib/ntp /freebsd-11-stable/contrib/nvi /freebsd-11-stable/contrib/one-true-awk /freebsd-11-stable/contrib/openbsm /freebsd-11-stable/contrib/openpam /freebsd-11-stable/contrib/openresolv /freebsd-11-stable/contrib/pf /freebsd-11-stable/contrib/sendmail /freebsd-11-stable/contrib/serf /freebsd-11-stable/contrib/sqlite3 /freebsd-11-stable/contrib/subversion /freebsd-11-stable/contrib/tcpdump /freebsd-11-stable/contrib/tcsh /freebsd-11-stable/contrib/tnftp /freebsd-11-stable/contrib/top /freebsd-11-stable/contrib/top/install-sh /freebsd-11-stable/contrib/tzcode/stdtime /freebsd-11-stable/contrib/tzcode/zic /freebsd-11-stable/contrib/tzdata /freebsd-11-stable/contrib/unbound /freebsd-11-stable/contrib/vis /freebsd-11-stable/contrib/wpa /freebsd-11-stable/contrib/xz /freebsd-11-stable/crypto/heimdal /freebsd-11-stable/crypto/openssh /freebsd-11-stable/crypto/openssl /freebsd-11-stable/gnu/lib /freebsd-11-stable/gnu/usr.bin/binutils /freebsd-11-stable/gnu/usr.bin/cc/cc_tools /freebsd-11-stable/gnu/usr.bin/gdb /freebsd-11-stable/lib/libc/locale/ascii.c /freebsd-11-stable/sys/cddl/contrib/opensolaris /freebsd-11-stable/sys/contrib/dev/acpica /freebsd-11-stable/sys/contrib/ipfilter /freebsd-11-stable/sys/contrib/libfdt /freebsd-11-stable/sys/contrib/octeon-sdk /freebsd-11-stable/sys/contrib/x86emu /freebsd-11-stable/sys/contrib/xz-embedded /freebsd-11-stable/usr.sbin/bhyve/atkbdc.h /freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c /freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h /freebsd-11-stable/usr.sbin/bhyve/console.c /freebsd-11-stable/usr.sbin/bhyve/console.h /freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c /freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c /freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h /freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c /freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h /freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c /freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h /freebsd-11-stable/usr.sbin/bhyve/rfb.c /freebsd-11-stable/usr.sbin/bhyve/rfb.h /freebsd-11-stable/usr.sbin/bhyve/sockstream.c /freebsd-11-stable/usr.sbin/bhyve/sockstream.h /freebsd-11-stable/usr.sbin/bhyve/usb_emul.c /freebsd-11-stable/usr.sbin/bhyve/usb_emul.h /freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c /freebsd-11-stable/usr.sbin/bhyve/vga.c /freebsd-11-stable/usr.sbin/bhyve/vga.h
# 301157	01-Jun-2016	mjg	Microoptimize locking primitives by avoiding unnecessary atomic ops. Inline version of primitives do an atomic op and if it fails they fallback to actual primitives, which immediately retry the atomic op. The obvious optimisation is to check if the lock is free and only then proceed to do an atomic op. Reviewed by: jhb, vangyzen
# 298819	29-Apr-2016	pfg	sys/kern: spelling fixes in comments. No functional change.
# 286166	01-Aug-2015	markj	Don't modify curthread->td_locks unless INVARIANTS is enabled. This field is only used in a KASSERT that verifies that no locks are held when returning to user mode. Moreover, the td_locks accounting is only correct when LOCK_DEBUG > 0, which is implied by INVARIANTS. Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D3205
# 277528	22-Jan-2015	hselasky	Revert for r277213: FreeBSD developers need more time to review patches in the surrounding areas like the TCP stack which are using MPSAFE callouts to restore distribution of callouts on multiple CPUs. Bump the __FreeBSD_version instead of reverting it. Suggested by: kmacy, adrian, glebius and kib Differential Revision: https://reviews.freebsd.org/D1438
# 277213	15-Jan-2015	hselasky	Major callout subsystem cleanup and rewrite: - Close a migration race where callout_reset() failed to set the CALLOUT_ACTIVE flag. - Callout callback functions are now allowed to be protected by spinlocks. - Switching the callout CPU number cannot always be done on a per-callout basis. See the updated timeout(9) manual page for more information. - The timeout(9) manual page has been updated to reflect how all the functions inside the callout API are working. The manual page has been made function oriented to make it easier to deduce how each of the functions making up the callout API are working without having to first read the whole manual page. Group all functions into a handful of sections which should give a quick top-level overview when the different functions should be used. - The CALLOUT_SHAREDLOCK flag and its functionality has been removed to reduce the complexity in the callout code and to avoid problems about atomically stopping callouts via callout_stop(). If someone needs it, it can be re-added. From my quick grep there are no CALLOUT_SHAREDLOCK clients in the kernel. - A new callout API function named "callout_drain_async()" has been added. See the updated timeout(9) manual page for a complete description. - Update the callout clients in the "kern/" folder to use the callout API properly, like cv_timedwait(). Previously there was some custom sleepqueue code in the callout subsystem, which has been removed, because we now allow callouts to be protected by spinlocks. This allows us to tear down the callout like done with regular mutexes, and a "td_slpmutex" has been added to "struct thread" to atomically teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and "SWT_SLEEPQTIMO" states can now be completely removed. Currently they are marked as available and will be cleaned up in a follow up commit. - Bump the __FreeBSD_version to indicate kernel modules need recompilation. - There has been several reports that this patch "seems to squash a serious bug leading to a callout timeout and panic". Kernel build testing: all architectures were built MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D1438 Sponsored by: Mellanox Technologies Reviewed by: jhb, adrian, sbruno and emaste
# 274474	13-Nov-2014	kib	Do not try to dereference thread pointer when the value is not a pointer. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 274092	04-Nov-2014	jhb	Add a new thread state "spinning" to schedgraph and add tracepoints at the start and stop of spinning waits in lock primitives.
# 273986	02-Nov-2014	kib	Followup to r273966. Fix the build with ADAPTIVE_LOCKMGRS kernel option. Note that the option is currently not used in any in-tree kernel configs, including LINTs. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 273966	02-Nov-2014	kib	Fix two issues with lockmgr(9) LK_CAN_SHARE() test, which determines whether the shared request for already shared-locked lock could be granted. Both problems result in the exclusive locker starvation. The concurrent exclusive request is indicated by either LK_EXCLUSIVE_WAITERS or LK_EXCLUSIVE_SPINNERS flags. The reverse condition, i.e. no exclusive waiters, must check that both flags are cleared. Add a flag LK_NODDLKTREAT for shared lock request to indicate that current thread guarantees that it does not own the lock in shared mode. This turns back the exclusive lock starvation avoidance code; see man page update for detailed description. Use LK_NODDLKTREAT when doing lookup(9). Reported and tested by: pho No objections from: attilio Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 270795	29-Aug-2014	kib	Add function and wrapper to switch lockmgr and vnode lock back to auto-promotion of shared to exclusive. Tested by: hrs, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 258541	25-Nov-2013	attilio	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip
# 255940	29-Sep-2013	kib	Add LK_TRYUPGRADE operation for lockmgr(9), which attempts to atomically upgrade shared lock to exclusive. On failure, error is returned and lock is not dropped in the process. Tested by: pho (previous version) No objections from: attilio Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (glebius)
# 255745	20-Sep-2013	davide	Fix lc_lock/lc_unlock() support for rmlocks held in shared mode. With current lock classes KPI it was really difficult because there was no way to pass an rmtracker object to the lock/unlock routines. In order to accomplish the task, modify the aforementioned functions so that they can return (or pass as argument) an uinptr_t, which is in the rm case used to hold a pointer to struct rm_priotracker for current thread. As an added bonus, this fixes rm_sleep() in the rm shared case, which right now can communicate priotracker structure between lc_unlock()/lc_lock(). Suggested by: jhb Reviewed by: jhb Approved by: re (delphij)
# 252212	25-Jun-2013	jhb	A few mostly cosmetic nits to aid in debugging: - Call lock_init() first before setting any lock_object fields in lock init routines. This way if the machine panics due to a duplicate init the lock's original state is preserved. - Somewhat similarly, don't decrement td_locks and td_slocks until after an unlock operation has completed successfully.
# 251326	03-Jun-2013	jhb	- Fix a couple of inverted panic messages for shared/exclusive mismatches of a lock within a single thread. - Fix handling of interlocks in WITNESS by properly requiring the interlock to be held exactly once if it is specified.
# 250411	09-May-2013	marcel	Add option WITNESS_NO_VNODE to suppress printing LORs between VNODE locks. To support this, VNODE locks are created with the LK_IS_VNODE flag. This flag is propagated down using the LO_IS_VNODE flag. Note that WITNESS still records the LOR. Only the printing and the optional entering into the kernel debugger is bypassed with the WITNESS_NO_VNODE option.
# 245113	06-Jan-2013	mjg	lockmgr: unlock interlock (if requested) when dealing with upgrade/downgrade requests for LK_NOSHARE locks, just like for shared locks. PR: kern/174969 Reviewed by: attilio MFC after: 1 week
# 244582	22-Dec-2012	attilio	Fixup r240424: On entering KDB backends, the hijacked thread to run interrupt context can still be idlethread. At that point, without the panic condition, it can still happen that idlethread then will try to acquire some locks to carry on some operations. Skip the idlethread check on block/sleep lock operations when KDB is active. Reported by: jh Tested by: jh MFC after: 1 week
# 243900	05-Dec-2012	attilio	Check for lockmgr recursion in case of disown and downgrade and panic also in !debugging kernel rather than having "undefined" behaviour. Tested by: avg MFC after: 1 week
# 240424	12-Sep-2012	attilio	Improve check coverage about idle threads. Idle threads are not allowed to acquire any lock but spinlocks. Deny any attempt to do so by panicing at the locking operation when INVARIANTS is on. Then, remove the check on blocking on a turnstile. The check in sleepqueues is left because they are not allowed to use tsleep() either which could happen still. Reviewed by: bde, jhb, kib MFC after: 1 week
# 233628	28-Mar-2012	fabient	Add software PMC support. New kernel events can be added at various location for sampling or counting. This will for example allow easy system profiling whatever the processor is with known tools like pmcstat(8). Simultaneous usage of software PMC and hardware PMC is possible, for example looking at the lock acquire failure, page fault while sampling on instructions. Sponsored by: NETASQ MFC after: 1 month
# 232547	05-Mar-2012	ivoras	Print out process name and thread id in the debugging message. This is useful because the message can end up in system logs in non-debugging operation. Reviewed by: attilio (earlier version)
# 228424	11-Dec-2011	avg	panic: add a switch and infrastructure for stopping other CPUs in SMP case Historical behavior of letting other CPUs merily go on is a default for time being. The new behavior can be switched on via kern.stop_scheduler_on_panic tunable and sysctl. Stopping of the CPUs has (at least) the following benefits: - more of the system state at panic time is preserved intact - threads and interrupts do not interfere with dumping of the system state Only one thread runs uninterrupted after panic if stop_scheduler_on_panic is set. That thread might call code that is also used in normal context and that code might use locks to prevent concurrent execution of certain parts. Those locks might be held by the stopped threads and would never be released. To work around this issue, it was decided that instead of explicit checks for panic context, we would rather put those checks inside the locking primitives. This change has substantial portions written and re-written by attilio and kib at various times. Other changes are heavily based on the ideas and patches submitted by jhb and mdf. bde has provided many insights into the details and history of the current code. The new behavior may cause problems for systems that use a USB keyboard for interfacing with system console. This is because of some unusual locking patterns in the ukbd code which have to be used because on one hand ukbd is below syscons, but on the other hand it has to interface with other usb code that uses regular mutexes/Giant for its concurrency protection. Dumping to USB-connected disks may also be affected. PR: amd64/139614 (at least) In cooperation with: attilio, jhb, kib, mdf Discussed with: arch@, bde Tested by: Eugene Grosbein <eugen@grosbein.net>, gnn, Steven Hartland <killing@multiplay.co.uk>, glebius, Andrew Boyer <aboyer@averesystems.com> (various versions of the patch) MFC after: 3 months (or never)
# 227588	16-Nov-2011	pjd	Constify arguments for locking KPIs where possible. This enables locking consumers to pass their own structures around as const and be able to assert locks embedded into those structures. Reviewed by: ed, kib, jhb
# 227309	07-Nov-2011	ed	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
# 224581	01-Aug-2011	kib	Fix the LK_NOSHARE lockmgr flag interaction with LK_UPGRADE and LK_DOWNGRADE lock ops. Namely, the ops should be NOP since LK_NOSHARE locks are always exclusive. Reported by: rmacklem Reviewed by: attilio Tested by: pho Approved by: re (kensmith) MFC after: 1 week
# 219028	25-Feb-2011	netchild	Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/ PMC/SYSV/...). No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed. Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: arch@ (parts by rwatson, trasz, jhb) X-MFC after: to be determined in last commit with code from this project
# 217265	11-Jan-2011	jhb	Remove unneeded includes of <sys/linker_set.h>. Other headers that use it internally contain nested includes. Reviewed by: bde
# 211531	20-Aug-2010	jhb	Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier. Reviewed by: kib MFC after: 3 days
# 201710	06-Jan-2010	attilio	Fix typos.
# 201709	06-Jan-2010	attilio	Tweak comments.
# 201703	06-Jan-2010	attilio	Exclusive waiters sleeping with LK_SLEEPFAIL on and using interruptible sleeps/timeout may have left spourious lk_exslpfail counts on, so clean it up even when accessing a shared queue acquisition, giving to lk_exslpfail the value of 'upper limit'. In the worst case scenario, infact (mixed interruptible sleep / LK_SLEEPFAIL waiters) what may happen is that both queues are awaken even if that's not necessary, but still no harm. Reported by: Lucius Windschuh <lwindschuh at googlemail dot com> Reviewed by: kib Tested by: pho, Lucius Windschuh <lwindschuh at googlemail dot com>
# 200447	12-Dec-2009	attilio	In current code, threads performing an interruptible sleep (on both sxlock, via the sx_{s, x}lock_sig() interface, or plain lockmgr), will leave the waiters flag on forcing the owner to do a wakeup even when if the waiter queue is empty. That operation may lead to a deadlock in the case of doing a fake wakeup on the "preferred" (based on the wakeup algorithm) queue while the other queue has real waiters on it, because nobody is going to wakeup the 2nd queue waiters and they will sleep indefinitively. A similar bug, is present, for lockmgr in the case the waiters are sleeping with LK_SLEEPFAIL on. In this case, even if the waiters queue is not empty, the waiters won't progress after being awake but they will just fail, still not taking care of the 2nd queue waiters (as instead the lock owned doing the wakeup would expect). In order to fix this bug in a cheap way (without adding too much locking and complicating too much the semantic) add a sleepqueue interface which does report the actual number of waiters on a specified queue of a waitchannel (sleepq_sleepcnt()) and use it in order to determine if the exclusive waiters (or shared waiters) are actually present on the lockmgr (or sx) before to give them precedence in the wakeup algorithm. This fix alone, however doesn't solve the LK_SLEEPFAIL bug. In order to cope with it, add the tracking of how many exclusive LK_SLEEPFAIL waiters a lockmgr has and if all the waiters on the exclusive waiters queue are LK_SLEEPFAIL just wake both queues. The sleepq_sleepcnt() introduction and ABI breakage require __FreeBSD_version bumping. Reported by: avg, kib, pho Reviewed by: kib Tested by: pho
# 199008	06-Nov-2009	attilio	Save the sack when doing a lockmgr_disown() call. Requested by: kib MFC: 3 days
# 197735	03-Oct-2009	attilio	When releasing a lockmgr held in shared way we need to use a write memory barrier in order to avoid, on architectures which doesn't have strong ordered writes, CPU instructions reordering. Diagnosed by: fabio
# 196970	08-Sep-2009	phk	Revert previous commit and add myself to the list of people who should know better than to commit with a cat in the area.
# 196969	08-Sep-2009	phk	Add necessary include.
# 196772	02-Sep-2009	attilio	Fix some bugs related to adaptive spinning: In the lockmgr support: - GIANT_RESTORE() is just called when the sleep finishes, so the current code can ends up into a giant unlock problem. Fix it by appropriately call GIANT_RESTORE() when needed. Note that this is not exactly ideal because for any interation of the adaptive spinning we drop and restore Giant, but the overhead should be not a factor. - In the lock held in exclusive mode case, after the adaptive spinning is brought to completition, we should just retry to acquire the lock instead to fallthrough. Fix that. - Fix a style nit In the sx support: - Call GIANT_SAVE() before than looping. This saves some overhead because in the current code GIANT_SAVE() is called several times. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
# 196334	17-Aug-2009	attilio	* Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check to a pointer-fetching specific operation check. Consequently, rename the operation ASSERT_ATOMIC_LOAD_PTR(). * Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking directly alignment on the word boundry, for all the given specific architectures. That's a bit too strict for some common case, but it assures safety. * Add a comment explaining the scope of the macro * Add a new stub in the lockmgr specific implementation Tested by: marcel (initial version), marius Reviewed by: rwatson, jhb (comment specific review) Approved by: re (kib)
# 194317	17-Jun-2009	attilio	Introduce support for adaptive spinning in lockmgr. Actually, as it did receive few tuning, the support is disabled by default, but it can opt-in with the option ADAPTIVE_LOCKMGRS. Due to the nature of lockmgrs, adaptive spinning needs to be selectively enabled for any interested lockmgr. The support is bi-directional, or, in other ways, it will work in both cases if the lock is held in read or write way. In particular, the read path is passible of further tunning using the sysctls debug.lockmgr.retries and debug.lockmgr.loops . Ideally, such sysctls should be axed or compiled out before release. Addictionally note that adaptive spinning doesn't cope well with LK_SLEEPFAIL. The reason is that many (and probabilly all) consumers of LK_SLEEPFAIL are mainly interested in knowing if the interlock was dropped or not in order to reacquire it and re-test initial conditions. This directly interacts with adaptive spinning because lockmgr needs to drop the interlock while spinning in order to avoid a deadlock (further details in the comments inside the patch). Final note: finding someone willing to help on tuning this with relevant workloads would be either very important and appreciated. Tested by: jeff, pho Requested by: many
# 193307	02-Jun-2009	attilio	Handle lock recursion differenty by always checking against LO_RECURSABLE instead the lock own flag itself. Tested by: pho
# 192853	26-May-2009	sson	Add the OpenSolaris dtrace lockstat provider. The lockstat provider adds probes for mutexes, reader/writer and shared/exclusive locks to gather contention statistics and other locking information for dtrace scripts, the lockstat(1M) command and other potential consumers. Reviewed by: attilio jhb jb Approved by: gnn (mentor)
# 192022	12-May-2009	trasz	Add missing 'break' statement. Found with: Coverity Prevent(tm) CID: 3919
# 189846	15-Mar-2009	jeff	- Wrap lock profiling state variables in #ifdef LOCK_PROFILING blocks.
# 189788	14-Mar-2009	jeff	- Call lock_profile_release when we're transitioning a lock to be owned by LK_KERNPROC. Discussed with: attilio
# 188244	06-Feb-2009	jhb	Tweak the output of VOP_PRINT/vn_printf() some. - Align the fifo output in fifo_print() with other vn_printf() output. - Remove the leading space from lockmgr_printinfo() so its output lines up in vn_printf(). - lockmgr_printinfo() now ends with a newline, so remove an extra newline from vn_printf().
# 182914	10-Sep-2008	jhb	Teach WITNESS about the interlocks used with lockmgr. This removes a bunch of spurious witness warnings since lockmgr grew witness support. Before this, every time you passed an interlock to a lockmgr lock WITNESS treated it as a LOR. Reviewed by: attilio
# 182010	22-Aug-2008	jhb	Use \|= rather than += when aggregrating requests to wakeup the swapper. What we really want is an inclusive or of all the requests, and += can in theory roll over to 0.
# 181334	05-Aug-2008	jhb	If a thread that is swapped out is made runnable, then the setrunnable() routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks
# 180798	25-Jul-2008	kib	s/alredy/already/ in the comments and the log message.
# 179306	25-May-2008	attilio	The "if" semantic is not needed, just fix this.
# 178166	12-Apr-2008	attilio	Use a "rel" memory barrier for disowning the lock as it cames from an exclusive locking operation.
# 178159	12-Apr-2008	attilio	- Re-introduce WITNESS support for lockmgr. About the old implementation the only one difference is that lockmgr*() functions now accept LK_NOWITNESS flag which skips ordering for the instanced calling. - Remove an unuseful stub in witness_checkorder() (because the above check doesn't allow ever happening) and allow witness_upgrade() to accept non-try operation too.
# 178150	12-Apr-2008	attilio	- Remove a stale comment. - Add an extra assertion in order to catch malformed requested operations.
# 177982	07-Apr-2008	attilio	- Use a different encoding for lockmgr options: make them encoded by bit in order to allow per-bit checks on the options flag, in particular in the consumers code [1] - Re-enable the check against TDP_DEADLKTREAT as the anti-waiters starvation patch allows exclusive waiters to override new shared requests. [1] Requested by: pjd, jeff
# 177957	06-Apr-2008	attilio	Optimize lockmgr in order to get rid of the pool mutex interlock, of the state transitioning flags and of msleep(9) callings. Use, instead, an algorithm very similar to what sx(9) and rwlock(9) alredy do and direct accesses to the sleepqueue(9) primitive. In order to avoid writer starvation a mechanism very similar to what rwlock(9) uses now is implemented, with the correspective per-thread shared lockmgrs counter. This patch also adds 2 new functions to lockmgr KPI: lockmgr_rw() and lockmgr_args_rw(). These two are like the 2 "normal" versions, but they both accept a rwlock as interlock. In order to realize this, the general lockmgr manager function "__lockmgr_args()" has been implemented through the generic lock layer. It supports all the blocking primitives, but currently only these 2 mappers live. The patch drops the support for WITNESS atm, but it will be probabilly added soon. Also, there is a little race in the draining code which is also present in the current CVS stock implementation: if some sharers, once they wakeup, are in the runqueue they can contend the lock with the exclusive drainer. This is hard to be fixed but the now committed code mitigate this issue a lot better than the (past) CVS version. In addition assertive KA_HELD and KA_UNHELD have been made mute assertions because they are dangerous and they will be nomore supported soon. In order to avoid namespace pollution, stack.h is splitted into two parts: one which includes only the "struct stack" definition (_stack.h) and one defining the KPI. In this way, newly added _lockmgr.h can just include _stack.h. Kernel ABI results heavilly changed by this commit (the now committed version of "struct lock" is a lot smaller than the previous one) and KPI results broken by lockmgr_rw() / lockmgr_args_rw() introduction, so manpages and __FreeBSD_version will be updated accordingly. Tested by: kris, pho, jeff, danger Reviewed by: jeff Sponsored by: Google, Summer of Code program 2007
# 176708	01-Mar-2008	attilio	- Handle buffer lock waiters count directly in the buffer cache instead than rely on the lockmgr support [1]: * bump the waiters only if the interlock is held * let brelvp() return the waiters count * rely on brelvp() instead than BUF_LOCKWAITERS() in order to check for the waiters number - Remove a namespace pollution introduced recently with lockmgr.h including lock.h by including lock.h directly in the consumers and making it mandatory for using lockmgr. - Modify flags accepted by lockinit(): * introduce LK_NOPROFILE which disables lock profiling for the specified lockmgr * introduce LK_QUIET which disables ktr tracing for the specified lockmgr [2] * disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it can only be used on a per-instance basis - Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer used This patch breaks KPI so __FreBSD_version will be bumped and manpages updated by further commits. Additively, 'struct buf' changes results in a disturbed ABI also. [2] Really, currently there is no ktr tracing in the lockmgr, but it will be added soon. [1] Submitted by: kib Tested by: pho, Andrea Barberio <insomniac at slackware dot it>
# 176559	25-Feb-2008	attilio	Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>
# 176320	15-Feb-2008	attilio	- Introduce lockmgr_args() in the lockmgr space. This function performs the same operation of lockmgr() but accepting a custom wmesg, prio and timo for the particular lock instance, overriding default values lkp->lk_wmesg, lkp->lk_prio and lkp->lk_timo. - Use lockmgr_args() in order to implement BUF_TIMELOCK() - Cleanup BUF_LOCK() - Remove LK_INTERNAL as it is nomore used in the lockmgr namespace Tested by: Andrea Barberio <insomniac at slackware dot it>
# 176249	13-Feb-2008	attilio	- Add real assertions to lockmgr locking primitives. A couple of notes for this: * WITNESS support, when enabled, is only used for shared locks in order to avoid problems with the "disowned" locks * KA_HELD and KA_UNHELD only exists in the lockmgr namespace in order to assert for a generic thread (not curthread) owning or not the lock. Really, this kind of check is bogus but it seems very widespread in the consumers code. So, for the moment, we cater this untrusted behaviour, until the consumers are not fixed and the options could be removed (hopefully during 8.0-CURRENT lifecycle) * Implementing KA_HELD and KA_UNHELD (not surported natively by WITNESS) made necessary the introduction of LA_MASKASSERT which specifies the range for default lock assertion flags * About other aspects, lockmgr_assert() follows exactly what other locking primitives offer about this operation. - Build real assertions for buffer cache locks on the top of lockmgr_assert(). They can be used with the BUF_ASSERT_*(bp) paradigm. - Add checks at lock destruction time and use a cookie for verifying lock integrity at any operation. - Redefine BUF_LOCKFREE() in order to not use a direct assert but let it rely on the aforementioned destruction time check. KPI results evidently broken, so __FreeBSD_version bumping and manpage update result necessary and will be committed soon. Side note: lockmgr_assert() will be used soon in order to implement real assertions in the vnode namespace replacing the legacy and still bogus "VOP_ISLOCKED()" way. Tested by: kris (earlier version) Reviewed by: jhb
# 176116	08-Feb-2008	attilio	Conver all explicit instances to VOP_ISLOCKED(arg, NULL) into VOP_ISLOCKED(arg, curthread). Now, VOP_ISLOCKED() and lockstatus() should only acquire curthread as argument; this will lead in axing the additional argument from both functions, making the code cleaner. Reviewed by: jeff, kib
# 176039	06-Feb-2008	attilio	td cannot be NULL in that place, so just axe out the check.
# 176014	05-Feb-2008	attilio	Add WITNESS support to lockmgr locking primitive. This support tries to be as parallel as possible with other locking primitives, but there are differences; more specifically: - The base witness support is alredy equipped for allowing lock duplication acquisition as lockmgr rely on this. - In the case of lockmgr_disown() the lock result unlocked by witness even if it is still held by the "kernel context" - In the case of upgrading we can have 3 different situations: * Total unlocking of the shared lock and nothing else * Real witness upgrade if the owner is the first upgrader * Shared unlocking and exclusive locking if the owner is not the first upgrade but it is still allowed to upgrade - LK_DRAIN is basically handled like an exclusive acquisition Additively new options LK_NODUP and LK_NOWITNESS can now be used with lockinit(): LK_NOWITNESS disables WITNESS for the specified lock while LK_NODUP enable duplicated locks tracking. This will require manpages update and a __FreeBSD_version bumping (addressed by further commits). This patch also fixes a problem occurring if a lockmgr is held in exclusive mode and the same owner try to acquire it in shared mode: currently there is a spourious shared locking acquisition while what we really want is a lock downgrade. Probabilly, this situation can be better served with a EDEADLK failing errno return. Side note: first testing on this patch alredy reveleated several LORs reported, so please expect LORs cascades until resolved. NTFS also is reported broken by WITNESS introduction. BTW, NTFS is exposing a lock leak which needs to be fixed, and this patch can help it out if rightly tweaked. Tested by: kris, yar, Scot Hetzel <swhetzel at gmail dot com>
# 175635	24-Jan-2008	attilio	Cleanup lockmgr interface and exported KPI: - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo
# 175229	11-Jan-2008	attilio	lockmgr() function will return successfully when trying to work under panic but it won't actually lock anything. This can lead some paths to reach lockmgr_disown() with inconsistent lock which will let trigger the relative assertions. Fix those in order to recognize panic situation and to not trigger. Reported by: pho Submitted by: kib
# 175167	08-Jan-2008	attilio	Fix a last second typo about recent lockmgr_disown() introduction.
# 175166	08-Jan-2008	attilio	Remove explicit calling of lockmgr() with the NULL argument. Now, lockmgr() function can only be called passing curthread and the KASSERT() is upgraded according with this. In order to support on-the-fly owner switching, the new function lockmgr_disown() has been introduced and gets used in BUF_KERNPROC(). KPI, so, results changed and FreeBSD version will be bumped soon. Differently from previous code, we assume idle thread cannot try to acquire the lockmgr as it cannot sleep, so loose the relative check[1] in BUF_KERNPROC(). Tested by: kris [1] kib asked for a KASSERT in the lockmgr_disown() about this condition, but after thinking at it, as this is a well known general rule, I found it not really necessary.
# 174951	27-Dec-2007	attilio	Trimm out now unused option LK_EXCLUPGRADE from the lockmgr namespace. This option just adds complexity and the new implementation no longer will support it, so axing it now that it is unused is probabilly the better idea. FreeBSD version is bumped in order to reflect the KPI breakage introduced by this patch. In the ports tree, kris found that only old OSKit code uses it, but as it is thought to work only on 2.x kernels serie, version bumping will solve any problem.
# 174948	27-Dec-2007	attilio	In order to avoid a huge class of deadlocks (in particular in interactions with the interlock), owner of the lock should be only curthread or at least, for its limited usage, NULL which identifies LK_KERNPROC. The thread "extra argument" for the lockmgr interface is going to be removed in the near future, but for the moment, just let kernel run for some days with this check on in order to find potential deadlocking places around the kernel and fix them.
# 174137	01-Dec-2007	rwatson	Modify stack(9) stack_print() and stack_sbuf_print() routines to use new linker interfaces for looking up function names and offsets from instruction pointers. Create two variants of each call: one that is "DDB-safe" and avoids locking in the linker, and one that is safe for use in live kernels, by virtue of observing locking, and in particular safe when kernel modules are being loaded and unloaded simultaneous to their use. This will allow them to be used outside of debugging contexts. Modify two of three current stack(9) consumers to use the DDB-safe interfaces, as they run in low-level debugging contexts, such as inside lockmgr(9) and the kernel memory allocator. Update man page.
# 173876	24-Nov-2007	attilio	transferlockers() is a very dangerous and hack-ish function as waiters should never be moved by one lock to another. As, luckily, nothing in our tree is using it, axe the function. This breaks lockmgr KPI, so interested, third-party modules should update their source code with appropriate replacement. Ok'ed by: ups, rwatson MFC after: 3 days
# 173733	18-Nov-2007	attilio	Expand lock class with the "virtual" function lc_assert which will offer an unified way for all the lock primitives to express lock assertions. Currenty, lockmgrs and rmlocks don't have assertions, so just panic in that case. This will be a base for more callout improvements. Ok'ed by: jhb, jeff
# 173600	14-Nov-2007	julian	generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
# 169675	18-May-2007	jhb	Move lock_profile_object_{init,destroy}() into lock_{init,destroy}().
# 168070	30-Mar-2007	jhb	- Use lock_init/lock_destroy() to setup the lock_object inside of lockmgr. We can now use LOCK_CLASS() as a stronger check in lockmgr_chain() as a result. This required putting back lk_flags as lockmgr's use of flags conflicted with other flags in lo_flags otherwise. - Tweak 'show lock' output for lockmgr to match sx, rw, and mtx.
# 167787	21-Mar-2007	jhb	Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes, rwlocks, and sx locks to 'lock_object'.
# 167782	21-Mar-2007	jhb	Handle the case when a thread is blocked on a lockmgr lock with LK_DRAIN in DDB's 'show sleepchain'. MFC after: 3 days
# 167368	09-Mar-2007	jhb	Add two new function pointers 'lc_lock' and 'lc_unlock' to lock classes. These functions are intended to be used to drop a lock and then reacquire it when doing an sleep such as msleep(9). Both functions accept a 'struct lock_object *' as their first parameter. The 'lc_unlock' function returns an integer that is then passed as the second paramter to the subsequent 'lc_lock' function. This can be used to communicate state. For example, sx locks and rwlocks use this to indicate if the lock was share/read locked vs exclusive/write locked. Currently, spin mutexes and lockmgr locks do not provide working lc_lock and lc_unlock functions.
# 167366	09-Mar-2007	jhb	Use C99-style struct member initialization for lock classes.
# 167012	26-Feb-2007	kmacy	general LOCK_PROFILING cleanup - only collect timestamps when a lock is contested - this reduces the overhead of collecting profiles from 20x to 5x - remove unused function from subr_lock.c - generalize cnt_hold and cnt_lock statistics to be kept for all locks - NOTE: rwlock profiling generates invalid statistics (and most likely always has) someone familiar with that should review
# 164246	13-Nov-2006	kmacy	track lock class name in a way that doesn't break WITNESS
# 164212	12-Nov-2006	kmacy	show lock class in profiling output for default case where type is not specified when initializing the lock Approved by: scottl (standing in for mentor rwatson)
# 164159	11-Nov-2006	kmacy	MUTEX_PROFILING has been generalized to LOCK_PROFILING. We now profile wait (time waited to acquire) and hold times for all kernel locks. If the architecture has a system synchronized TSC, the profiling code will use that - thereby minimizing profiling overhead. Large chunks of profiling code have been moved out of line, the overhead measured on the T1 for when it is compiled in but not enabled is < 1%. Approved by: scottl (standing in for mentor rwatson) Reviewed by: des and jhb
# 162941	02-Oct-2006	tegge	If the buffer lock has waiters after the buffer has changed identity then getnewbuf() needs to drop the buffer in order to wake waiters that might sleep on the buffer in the context of the old identity.
# 161337	15-Aug-2006	jhb	Add a new 'show sleepchain' ddb command similar to 'show lockchain' except that it operates on lockmgr and sx locks. This can be useful for tracking down vnode deadlocks in VFS for example. Note that this command is a bit more fragile than 'show lockchain' as we have to poke around at the wait channel of a thread to see if it points to either a struct lock or a condition variable inside of a struct sx. If td_wchan points to something unmapped, then this command will terminate early due to a fault, but no harm will be done.
# 161322	15-Aug-2006	jhb	Add a 'show lockmgr' command that dumps the relevant details of a lockmgr lock.
# 160356	14-Jul-2006	pjd	Remove duplicated #include.
# 153693	23-Dec-2005	jeff	- Remove and unused include. Submitted by: Antoine Brodin <antoine.brodin@laposte.net>
# 150807	02-Oct-2005	rwatson	Include kdb.h so that kdb_active is declared regardless of KDB being included in the kernel. MFC after: 0 days
# 150646	27-Sep-2005	rwatson	In lockstatus(), don't lock and unlock the interlock when testing the sleep lock status while kdb_active, or we risk contending with the mutex on another CPU, resulting in a panic when using "show lockedvnods" while in DDB. MFC after: 3 days Reviewed by: jhb Reported by: kris
# 149723	02-Sep-2005	ssouhlal	Print out a warning and a backtrace if we try to unlock a lockmgr that we do not hold. Glanced at by: phk MFC after: 3 days
# 149574	29-Aug-2005	pjd	Add 'depth' argument to CTRSTACK() macro, which allows to reduce number of ktr slots used. If 'depth' is equal to 0, the whole stack will be logged, just like before.
# 148669	03-Aug-2005	jeff	- Fix a problem that slipped through review; the stack member of the lockmgr structure should have the lk_ prefix. - Add stack_print(lkp->lk_stack) to the information printed with lockmgr_printinfo().
# 148668	03-Aug-2005	jeff	- Replace the series of DEBUG_LOCKS hacks which tried to save the vn_lock caller by saving the stack of the last locker/unlocker in lockmgr. We also put the stack in KTR at the moment. Contributed by: Antoine Brodin <antoine.brodin@laposte.net>
# 144928	12-Apr-2005	jeff	- Differentiate two UPGRADE panics so I have a better idea of what's going on here.
# 144705	06-Apr-2005	jeff	- Remove dead code.
# 144589	03-Apr-2005	jeff	- Slightly restructure acquire() so I can add more ktr information and an assert to help find two strange bugs. - Remove some nearby spls.
# 144372	31-Mar-2005	jeff	- Add a LK_NOSHARE flag which forces all shared lock requests to be treated as exclusive lock requests. Sponsored by: Isilon Systems, Inc.
# 144364	31-Mar-2005	jeff	- Remove apause(). It makes no sense with our present mutex implementation since simply unlocking a mutex does not ensure that one of the waiters will run and acquire it. We're more likely to reacquire the mutex before anyone else has a chance. It has also bit me three times now, as it's not safe to drop the interlock before sleeping in many cases. Sponsored by: Isilon Systems, Inc.
# 144222	28-Mar-2005	jeff	- Don't bump the count twice in the LK_DRAIN case. Sponsored by: Isilon Systems, Inc.
# 144082	24-Mar-2005	jeff	- Restore COUNT() in all of its original glory. Don't make it dependent on DEBUG as ufs will soon grow a dependency on this count. Discussed with: bde Sponsored by: Isilon Systems, Inc.
# 144060	24-Mar-2005	jeff	- Complete the implementation of td_locks. Track the number of outstanding lockmgr locks that this thread owns. This is complicated due to LK_KERNPROC and because lockmgr tolerates unlocking an unlocked lock. Sponsored by: Isilon Systes, Inc.
# 143621	15-Mar-2005	jeff	- transferlockers() requires the interlock to be SMP safe. Sponsored by: Isilon Systems, Inc.
# 140824	25-Jan-2005	jeff	- Include LK_INTERLOCK in LK_EXTFLG_MASK so that it makes its way into acquire. - Correct the condition that causes us to skip apause() to only require the presence of LK_INTERLOCK. Sponsored by: Isilon Systems, Inc.
# 140711	24-Jan-2005	jeff	- Do not use APAUSE if LK_INTERLOCK is set. We lose synchronization if the lockmgr interlock is dropped after the caller's interlock is dropped. - Change some lockmgr KTRs to be slightly more helpful. Sponsored By: Isilon Systems, Inc.
# 139804	06-Jan-2005	imp	/* -> /*- for copyright notices, minor format tweaks as necessary
# 138203	29-Nov-2004	ps	When upgrading the shared lock to an exclusive lock, if we discover that the exclusive lock is already held, then we call panic. Don't clobber internal lock state before panic'ing. This change improves debugging if this case were to happen. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson
# 134365	26-Aug-2004	kan	Reintroduce slightly modified patch from kern/69964. Check for LK_HAVE_EXL in both acquire invocations. MFC after: 5 days
# 134187	23-Aug-2004	kan	Temporarily back out r1.74 as it seems to cause a number of regressions accordimg to numerous reports. It might get reintroduced some time later when an exact failure mode is understood better.
# 133859	16-Aug-2004	kan	Upgrading a lock does not play well together with acquiring an exclusive lock and can lead to two threads being granted exclusive access. Check that no one has the same lock in exclusive mode before proceeding to acquire it. The LK_WANT_EXCL and LK_WANT_UPGRADE bits act as mini-locks and can block other threads. Normally this is not a problem since the mini locks are upgraded to full locks and the release of the locks will unblock the other threads. However if a thread reset the bits without obtaining a full lock other threads are not awoken. Add missing wakeups for these cases. PR: kern/69964 Submitted by: Stephan Uphoff <ups at tree dot com> Very good catch by: Stephan Uphoff <ups at tree dot com>
# 132587	23-Jul-2004	rwatson	Don't include a "\n" in KTR output, it confuses automatic parsing.
# 130023	02-Jun-2004	tjr	Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid having to acquire sched_lock when manipulating it in lockmgr(), uiomove(), and uiomove_fromphys(). Reviewed by: jhb
# 124163	06-Jan-2004	kan	Add pid to the info printed in lockmgr_printinfo. This makes VFS diagnostic messages slightly more useful.
# 117660	15-Jul-2003	truckman	Rearrange the SYSINIT order to call lockmgr_init() earlier so that the runtime lockmgr initialization code in lockinit() can be eliminated. Reviewed by: jhb
# 117494	12-Jul-2003	truckman	Extend the mutex pool implementation to permit the creation and use of multiple mutex pools with different options and sizes. Mutex pools can be created with either the default sleep mutexes or with spin mutexes. A dynamically created mutex pool can now be destroyed if it is no longer needed. Create two pools by default, one that matches the existing pool that uses the MTX_NOWITNESS option that should be used for building higher level locks, and a new pool with witness checking enabled. Modify the users of the existing mutex pool to use the appropriate pool in the new implementation. Reviewed by: jhb
# 116182	10-Jun-2003	obrien	Use __FBSDID().
# 112106	11-Mar-2003	jhb	Use the KTR_LOCK mask for logging events via KTR in lockmgr() rather than KTR_LOCKMGR. lockmgr locks are locks just like other locks.
# 111883	04-Mar-2003	jhb	Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls to WITNESS_WARN().
# 111463	25-Feb-2003	jeff	- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick
# 110986	16-Feb-2003	jeff	- Add a WITNESS_SLEEP() for the appropriate cases in lockmgr().
# 110414	05-Feb-2003	julian	The lockmanager has to keep track of locks per thread, not per process. Submitted by: david Xu (davidxu@) Reviewed by: jhb@
# 110190	01-Feb-2003	julian	Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.
# 109877	26-Jan-2003	davidxu	Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian
# 107414	30-Nov-2002	mckusick	Remove a race condition / deadlock from snapshots. When converting from individual vnode locks to the snapshot lock, be sure to pass any waiting processes along to the new lock as well. This transfer is done by a new function in the lock manager, transferlockers(from_lock, to_lock); Thanks to Lamont Granquist <lamont@scriptkiddie.org> for his help in pounding on snapshots beyond all reason and finding this deadlock. Sponsored by: DARPA & NAI Labs.
# 105370	17-Oct-2002	mckusick	Have lockinit() initialize the debugging fields of a lock when DEBUG_LOCKS is defined. Sponsored by: DARPA & NAI Labs.
# 102477	27-Aug-2002	bde	Include <sys/lockmgr.h> for the definitions of the locking interfaces that are implemented here instead of depending on namespace pollution in <sys/lock.h>. Fixed nearby include messes (1 disordered include and 1 unused include).
# 102412	25-Aug-2002	charnier	Replace various spelling with FALLTHROUGH which is lint()able
# 97540	30-May-2002	jeff	Record the file, line, and pid of the last successful shared lock holder. This is useful as a last effort in debugging file system deadlocks. This is enabled via 'options DEBUG_LOCKS'
# 93818	04-Apr-2002	jhb	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
# 91698	05-Mar-2002	eivind	Change wmesg to const char * instead of char *
# 88318	20-Dec-2001	dillon	Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget() against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout. Hopefully MFC: before the 4.5 release
# 86333	13-Nov-2001	dillon	Create a mutex pool API for short term leaf mutexes. Replace the manual mutex pool in kern_lock.c (lockmgr locks) with the new API. Replace the mutexes embedded in sxlocks with the new API.
# 84812	11-Oct-2001	jhb	Add missing includes of sys/ktr.h.
# 84781	10-Oct-2001	jhb	Malloc mutexes pre-zero'd as random garbage (including 0xdeadcode) my trigget the check to make sure we don't initalize a mutex twice.
# 83420	13-Sep-2001	jhb	Fix locking on td_flags for TDF_DEADLKTREAT. If the comments in the code are true that curthread can change during this function, then this flag needs to become a KSE flag, not a thread flag.
# 83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 81506	10-Aug-2001	jhb	If we've panic'd already, then just bail in lockmgr rather than blocking or possibly panic'ing again.
# 76100	28-Apr-2001	alfred	Instead of asserting that a mutex is not still locked after unlocking it, assert that the mutex is owned and not recursed prior to unlocking it. This should give a clearer diagnostic when a programming error is caught.
# 75740	20-Apr-2001	alfred	Assert that when using an interlock mutex it is not recursed when lockmgr() is called. Ok'd by: jhb
# 75472	13-Apr-2001	alfred	convert if/panic -> KASSERT, explain what triggered the assertion
# 75304	08-Apr-2001	jake	Fix a precedence bug. ! has higher precedence than &.
# 72227	09-Feb-2001	jhb	Proc locking.
# 72200	09-Feb-2001	bmilekic	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# 71576	24-Jan-2001	jasone	Convert all simplelocks to mutexes and remove the simplelock implementations.
# 71320	21-Jan-2001	jasone	Remove MUTEX_DECLARE() and MTX_COLD. Instead, postpone full mutex initialization until after malloc() is safe to call, then iterate through all mutexes and complete their initialization. This change is necessary in order to avoid some circular bootstrapping dependencies.
# 69432	01-Dec-2000	jake	Use msleep instead of mtx_exit; tsleep; mtx_enter, which is not safe.
# 67353	20-Oct-2000	jhb	- machine/mutex.h -> sys/mutex.h - The initial lock_mtx mutex used in the lockmgr code is initialized very early, so use MUTEX_DECLARE() and MTX_COLD.
# 67046	12-Oct-2000	jasone	For lockmgr mutex protection, use an array of mutexes that are allocated and initialized during boot. This avoids bloating sizeof(struct lock). As a side effect, it is no longer necessary to enforce the assumtion that lockinit()/lockdestroy() calls are paired, so the LK_VALID flag has been removed. Idea taken from: BSD/OS.
# 66615	03-Oct-2000	jasone	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
# 66296	23-Sep-2000	ps	Move MAXCPU from machine/smp.h to machine/param.h to fix breakage with !SMP kernels. Also, replace NCPUS with MAXCPU since they are redundant.
# 65932	16-Sep-2000	phk	Make LINT compile.
# 58132	16-Mar-2000	phk	Eliminate the undocumented, experimental, non-delivering and highly dangerous MAX_PERF option.
# 54444	11-Dec-1999	eivind	Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked. This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS. Discussed with: grog, mch, peter, phk Reviewed by: peter
# 53090	11-Nov-1999	alc	Correct a locking error in apause: It should always hold the simple lock when it returns. Also, eliminate spinning on a uniprocessor. It's pointless. Submitted by: bde, Assar Westerlund <assar@sics.se>
# 51702	26-Sep-1999	dillon	Fix process p_locks accounting. Conversions of the owner to LK_KERNPROC caused p_locks to be improperly accounted. Submitted by: Tor.Egge@fast.no
# 50477	27-Aug-1999	peter	$Id$ -> $FreeBSD$
# 48301	28-Jun-1999	mckusick	When requesting an exclusive lock with LK_NOWAIT, do not panic if LK_RECURSIVE is not set, as we will simply return that the lock is busy and not actually deadlock. This allows processes to use polling locks against buffers that they may already hold exclusively locked.
# 48225	26-Jun-1999	mckusick	Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.
# 44772	15-Mar-1999	julian	fix breakage for alphas. Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>
# 44681	12-Mar-1999	julian	This solves a deadlock that can occur when read()ing into a file-mmap() space. When doing this, it is possible to for another process to attempt to get an exclusive lock on the vnode and deadlock the mmap/read combination when the uiomove() call tries to obtain a second shared lock on the vnode. There is still a potential deadlock situation with write()/mmap(). Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Luoqi Chen <luoqi@freebsd.org> Delimmitted by tag PRE_MATT_MMAP_LOCK and POST_MATT_MMAP_LOCK in kern/kern_lock.c kern/kern_subr.c
# 42900	20-Jan-1999	eivind	Add 'options DEBUG_LOCKS', which stores extra information in struct lock, and add some macros and function parameters to make sure that the information get to the point where it can be put in the lock structure. While I'm here, add DEBUG_VFS_LOCKS to LINT.
# 42453	09-Jan-1999	eivind	KNFize, by bde.
# 42408	08-Jan-1999	eivind	Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith
# 41362	26-Nov-1998	eivind	Staticize.
# 35242	17-Apr-1998	bde	Really finish supporting compiling with `gcc -ansi'.
# 34194	07-Mar-1998	dyson	Some kern_lock code improvements. Add missing wakeup, and enable disabling some diagnostics when memory or speed is at a premium.
# 33232	10-Feb-1998	eivind	Include SIMPLELOCK_DEBUG functions even if SMP if compiling LINT; give an error for the combination if _not_ compiling LINT.
# 33134	06-Feb-1998	eivind	Back out DIAGNOSTIC changes.
# 33108	04-Feb-1998	eivind	Turn DIAGNOSTIC into a new-style option.
# 31016	07-Nov-1997	phk	Remove a bunch of variables which were unused both in GENERIC and LINT. Found by: -Wunused
# 30813	28-Oct-1997	bde	Removed unused #includes.
# 29653	21-Sep-1997	dyson	Change the M_NAMEI allocations to use the zone allocator. This change plus the previous changes to use the zone allocator decrease the useage of malloc by half. The Zone allocator will be upgradeable to be able to use per CPU-pools, and has more intelligent usage of SPLs. Additionally, it has reasonable stats gathering capabilities, while making most calls inline.
# 28569	22-Aug-1997	phk	typo in comment.
# 28393	18-Aug-1997	dyson	Allow lockmgr to work without a current process. Disallowing that was a mistake in the lockmgr rewrite.
# 28349	18-Aug-1997	fsmp	Added includes of smp.h for SMP. This eliminates a bazillion warnings about implicit s_lock & friends.
# 28345	18-Aug-1997	dyson	Fix kern_lock so that it will work. Additionally, clean-up some of the VM systems usage of the kernel lock (lockmgr) code. This is a first pass implementation, and is expected to evolve as needed. The API for the lock manager code has not changed, but the underlying implementation has changed significantly. This change should not materially affect our current SMP or UP code without non-standard parameters being used.
# 27894	04-Aug-1997	fsmp	pushed down "volatility" of simplelock to actual int inside the struct. Submitted by: bde@zeta.org.au, smp@csn.net
# 24480	01-Apr-1997	bde	Fixed commented-out Lite2 sysctl debug.lockpausetime. Removed unused #includes.
# 24277	25-Mar-1997	peter	Add missing $Id$ Note; the RCS file has also been reconstructed to have a CSRG vendor branch.
# 24274	25-Mar-1997	peter	Replace original rev 1.3; Author: bde; Date: 1997/02/25 17:24:43; Fix counting of simplelocks in SIMPLELOCK_DEBUG Fix style regression
# 24273	25-Mar-1997	peter	Replace original rev 1.2; Author: mpp; Date: 1997/02/12 06:52:30 Add missing #include <sys/systm.h>
# 24271	25-Mar-1997	peter	Replace original revision 1.1; Author dyson; Date: 1997/02/10 02:28:15 Changes from Lite2: - DEBUG -> SIMPLELOCK_DEBUG - cosmetic fixes - bzero of lock at init time -> explicit init of members.
# 24270	25-Mar-1997	peter	This commit was generated by cvs2svn to compensate for changes in r24269, which included commits to RCS files with non-trunk default branches.
# 24269	25-Mar-1997	peter	Import 4.4BSD-Lite2 onto CSRG branch