History log of /netbsd-current/sys/kern/kern_sleepq.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.87 02-Nov-2023 martin

Back out the following revisions on behalf of core:

sys/sys/lwp.h: revision 1.228
sys/sys/pipe.h: revision 1.40
sys/kern/uipc_socket.c: revision 1.306
sys/kern/kern_sleepq.c: revision 1.84
sys/rump/librump/rumpkern/locks_up.c: revision 1.13
sys/kern/sys_pipe.c: revision 1.165
usr.bin/fstat/fstat.c: revision 1.119
sys/rump/librump/rumpkern/locks.c: revision 1.87
sys/ddb/db_xxx.c: revision 1.78
sys/ddb/db_command.c: revision 1.187
sys/sys/condvar.h: revision 1.18
sys/ddb/db_interface.h: revision 1.42
sys/sys/socketvar.h: revision 1.166
sys/kern/uipc_syscalls.c: revision 1.209
sys/kern/kern_condvar.c: revision 1.60

Add cv_fdrestart() [...]
Use cv_fdrestart() to implement fo_restart.
Simplify/streamline pipes a little bit [...]

This changes have caused regressions and need to be debugged.
The cv_fdrestart() addition needs more discussion.


# 1.86 15-Oct-2023 riastradh

kern_sleepq.c: Sort includes. No functional change intended.


# 1.85 15-Oct-2023 riastradh

sys/lwp.h: Nix sys/syncobj.h dependency.

Remove it in ddb/db_syncobj.h too.

New sys/wchan.h defines wchan_t so that users need not pull in
sys/syncobj.h to get it.

Sprinkle #include <sys/syncobj.h> in .c files where it is now needed.


# 1.84 13-Oct-2023 ad

Add cv_fdrestart() (better name suggestions welcome):

Like cv_broadcast(), but make any LWPs that share the same file descriptor
table as the caller return ERESTART when resuming. Used to dislodge LWPs
waiting for I/O that prevent a file descriptor from being closed, without
upsetting access to the file (not descriptor) made from another direction.


# 1.83 08-Oct-2023 ad

Oops, fix inverted test.


# 1.82 08-Oct-2023 ad

Ensure that an LWP that has taken a legitimate wakeup never produces an
error code from sleepq_block(). Then, it's possible to make cv_signal()
work as expected and only ever wake a singular LWP.


# 1.81 08-Oct-2023 ad

sleepq_block(): slightly reduce number of test+branch in the common case.


# 1.80 07-Oct-2023 ad

sleepq_uncatch(): fix typo that's been there since 2020, hello @thorpej lol:

- l->l_flag = ~(LW_SINTR | LW_CATCHINTR | LW_STIMO);
+ l->l_flag &= ~(LW_SINTR | LW_CATCHINTR | LW_STIMO);


# 1.79 07-Oct-2023 ad

sleepq_uncatch(): clear LW_STIMO too, so that there's no possibility that
the newly non-interruptable sleep could produce EWOULDBLOCK (paranoia).


# 1.78 05-Oct-2023 ad

Resolve !MULTIPROCESSOR build problem with the nasty kernel lock macros.


# 1.77 04-Oct-2023 ad

Eliminate l->l_biglocks. Originally I think it had a use but these days a
local variable will do.


# 1.76 23-Sep-2023 ad

Sigh.. Adjust previous to work as intended. The boosted LWP priority
didn't persist as far as the run queue because l_syncobj gets reset
earlier than I recalled.


# 1.75 23-Sep-2023 ad

- Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).


# 1.74 09-Apr-2023 riastradh

kern: KASSERT(A && B) -> KASSERT(A); KASSERT(B)


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.73 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.72 29-Jun-2022 riastradh

ktrace(9): Fix mutex detection in ktrcsw.

On _entry_ to sleepq_block, l->l_syncobj is set so that ktrcsw
(ktr_csw) has the opportunity to detect whether it's a mutex or
rwlock. It is critical to avoid ktealloc when we're sleeping on a
mutex because we may be in softint context where ktealloc is
forbidden.

But after mi_switch, on _exit_ from sleepq_block, l->l_syncobj may
have been changed back to &sched_syncobj or something by
sleepq_remove, and so ktrcsw can no longer rely on l->l_syncobj to
determine whether we _were_ sleeping on a mutex or not.

Instead, save the syncobj in sleepq_block and pass it through as an
argument to ktrcsw.

Reported-by: syzbot+414edba9d161b7502658@syzkaller.appspotmail.com
Reported-by: syzbot+4425c97ac717b12495a2@syzkaller.appspotmail.com
Reported-by: syzbot+5812565b926ee8eb5cf3@syzkaller.appspotmail.com
Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
Reported-by: syzbot+909a8e743c967d97f433@syzkaller.appspotmail.com
Reported-by: syzbot+e2a34bb5509bea0bba11@syzkaller.appspotmail.com
Reported-by: syzbot+faaea3aad6c9d0829f76@syzkaller.appspotmail.com


# 1.71 08-Apr-2022 andvar

fix various typos, mainly in comments, but also log messages, docs, game text.


# 1.70 01-Jan-2022 msaitoh

s/happends/happens/ in comment.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.69 23-Oct-2020 thorpej

- sleepq_block(): Add a new LWP flag, LW_CATCHINTR, that is used to track
the intent to catch signals while sleeping. Initialize this flag based
on the catch_p argument to sleepq_block(), and rather than test catch_p
when awakened, test LW_CATCHINTR. This allows the intent to change
(based on whatever criteria the owner of the sleepq wishes) while the
LWP is asleep. This is separate from LW_SINTR in order to leave all
other logic around LW_SINTR unaffected.
- In sleepq_transfer(), adjust also LW_CATCHINTR based on the catch_p
argument. Also allow the new LWP lock argument to be NULL, which
will cause the lwp_setlock() call to be skipped; this allows transfer
to another sleepq that is known to be protected by the same lock.
- Add a new function, sleepq_uncatch(), that will transition an LWP
from "interruptible sleep" to "uninterruptible sleep" on its current
sleepq.


# 1.68 21-May-2020 thorpej

In sleepq_insert(), in the SOBJ_SLEEPQ_SORTED case, if there are existing
waiters of lower priority, then the new LWP will be inserted in FIFO order
with respect to other LWPs of the same priority. However, if all other
LWPs are of equal priority to the LWP being inserted, the new LWP would
be inserted in LIFO order.

Fix this to always insert in FIFO order with respect to equal priority LWPs.

OK ad@.


# 1.67 08-May-2020 thorpej

Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.86 15-Oct-2023 riastradh

kern_sleepq.c: Sort includes. No functional change intended.


# 1.85 15-Oct-2023 riastradh

sys/lwp.h: Nix sys/syncobj.h dependency.

Remove it in ddb/db_syncobj.h too.

New sys/wchan.h defines wchan_t so that users need not pull in
sys/syncobj.h to get it.

Sprinkle #include <sys/syncobj.h> in .c files where it is now needed.


# 1.84 13-Oct-2023 ad

Add cv_fdrestart() (better name suggestions welcome):

Like cv_broadcast(), but make any LWPs that share the same file descriptor
table as the caller return ERESTART when resuming. Used to dislodge LWPs
waiting for I/O that prevent a file descriptor from being closed, without
upsetting access to the file (not descriptor) made from another direction.


# 1.83 08-Oct-2023 ad

Oops, fix inverted test.


# 1.82 08-Oct-2023 ad

Ensure that an LWP that has taken a legitimate wakeup never produces an
error code from sleepq_block(). Then, it's possible to make cv_signal()
work as expected and only ever wake a singular LWP.


# 1.81 08-Oct-2023 ad

sleepq_block(): slightly reduce number of test+branch in the common case.


# 1.80 07-Oct-2023 ad

sleepq_uncatch(): fix typo that's been there since 2020, hello @thorpej lol:

- l->l_flag = ~(LW_SINTR | LW_CATCHINTR | LW_STIMO);
+ l->l_flag &= ~(LW_SINTR | LW_CATCHINTR | LW_STIMO);


# 1.79 07-Oct-2023 ad

sleepq_uncatch(): clear LW_STIMO too, so that there's no possibility that
the newly non-interruptable sleep could produce EWOULDBLOCK (paranoia).


# 1.78 05-Oct-2023 ad

Resolve !MULTIPROCESSOR build problem with the nasty kernel lock macros.


# 1.77 04-Oct-2023 ad

Eliminate l->l_biglocks. Originally I think it had a use but these days a
local variable will do.


# 1.76 23-Sep-2023 ad

Sigh.. Adjust previous to work as intended. The boosted LWP priority
didn't persist as far as the run queue because l_syncobj gets reset
earlier than I recalled.


# 1.75 23-Sep-2023 ad

- Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).


# 1.74 09-Apr-2023 riastradh

kern: KASSERT(A && B) -> KASSERT(A); KASSERT(B)


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.73 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.72 29-Jun-2022 riastradh

ktrace(9): Fix mutex detection in ktrcsw.

On _entry_ to sleepq_block, l->l_syncobj is set so that ktrcsw
(ktr_csw) has the opportunity to detect whether it's a mutex or
rwlock. It is critical to avoid ktealloc when we're sleeping on a
mutex because we may be in softint context where ktealloc is
forbidden.

But after mi_switch, on _exit_ from sleepq_block, l->l_syncobj may
have been changed back to &sched_syncobj or something by
sleepq_remove, and so ktrcsw can no longer rely on l->l_syncobj to
determine whether we _were_ sleeping on a mutex or not.

Instead, save the syncobj in sleepq_block and pass it through as an
argument to ktrcsw.

Reported-by: syzbot+414edba9d161b7502658@syzkaller.appspotmail.com
Reported-by: syzbot+4425c97ac717b12495a2@syzkaller.appspotmail.com
Reported-by: syzbot+5812565b926ee8eb5cf3@syzkaller.appspotmail.com
Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
Reported-by: syzbot+909a8e743c967d97f433@syzkaller.appspotmail.com
Reported-by: syzbot+e2a34bb5509bea0bba11@syzkaller.appspotmail.com
Reported-by: syzbot+faaea3aad6c9d0829f76@syzkaller.appspotmail.com


# 1.71 08-Apr-2022 andvar

fix various typos, mainly in comments, but also log messages, docs, game text.


# 1.70 01-Jan-2022 msaitoh

s/happends/happens/ in comment.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.69 23-Oct-2020 thorpej

- sleepq_block(): Add a new LWP flag, LW_CATCHINTR, that is used to track
the intent to catch signals while sleeping. Initialize this flag based
on the catch_p argument to sleepq_block(), and rather than test catch_p
when awakened, test LW_CATCHINTR. This allows the intent to change
(based on whatever criteria the owner of the sleepq wishes) while the
LWP is asleep. This is separate from LW_SINTR in order to leave all
other logic around LW_SINTR unaffected.
- In sleepq_transfer(), adjust also LW_CATCHINTR based on the catch_p
argument. Also allow the new LWP lock argument to be NULL, which
will cause the lwp_setlock() call to be skipped; this allows transfer
to another sleepq that is known to be protected by the same lock.
- Add a new function, sleepq_uncatch(), that will transition an LWP
from "interruptible sleep" to "uninterruptible sleep" on its current
sleepq.


# 1.68 21-May-2020 thorpej

In sleepq_insert(), in the SOBJ_SLEEPQ_SORTED case, if there are existing
waiters of lower priority, then the new LWP will be inserted in FIFO order
with respect to other LWPs of the same priority. However, if all other
LWPs are of equal priority to the LWP being inserted, the new LWP would
be inserted in LIFO order.

Fix this to always insert in FIFO order with respect to equal priority LWPs.

OK ad@.


# 1.67 08-May-2020 thorpej

Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.84 13-Oct-2023 ad

Add cv_fdrestart() (better name suggestions welcome):

Like cv_broadcast(), but make any LWPs that share the same file descriptor
table as the caller return ERESTART when resuming. Used to dislodge LWPs
waiting for I/O that prevent a file descriptor from being closed, without
upsetting access to the file (not descriptor) made from another direction.


# 1.83 08-Oct-2023 ad

Oops, fix inverted test.


# 1.82 08-Oct-2023 ad

Ensure that an LWP that has taken a legitimate wakeup never produces an
error code from sleepq_block(). Then, it's possible to make cv_signal()
work as expected and only ever wake a singular LWP.


# 1.81 08-Oct-2023 ad

sleepq_block(): slightly reduce number of test+branch in the common case.


# 1.80 07-Oct-2023 ad

sleepq_uncatch(): fix typo that's been there since 2020, hello @thorpej lol:

- l->l_flag = ~(LW_SINTR | LW_CATCHINTR | LW_STIMO);
+ l->l_flag &= ~(LW_SINTR | LW_CATCHINTR | LW_STIMO);


# 1.79 07-Oct-2023 ad

sleepq_uncatch(): clear LW_STIMO too, so that there's no possibility that
the newly non-interruptable sleep could produce EWOULDBLOCK (paranoia).


# 1.78 05-Oct-2023 ad

Resolve !MULTIPROCESSOR build problem with the nasty kernel lock macros.


# 1.77 04-Oct-2023 ad

Eliminate l->l_biglocks. Originally I think it had a use but these days a
local variable will do.


# 1.76 23-Sep-2023 ad

Sigh.. Adjust previous to work as intended. The boosted LWP priority
didn't persist as far as the run queue because l_syncobj gets reset
earlier than I recalled.


# 1.75 23-Sep-2023 ad

- Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).


# 1.74 09-Apr-2023 riastradh

kern: KASSERT(A && B) -> KASSERT(A); KASSERT(B)


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.73 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.72 29-Jun-2022 riastradh

ktrace(9): Fix mutex detection in ktrcsw.

On _entry_ to sleepq_block, l->l_syncobj is set so that ktrcsw
(ktr_csw) has the opportunity to detect whether it's a mutex or
rwlock. It is critical to avoid ktealloc when we're sleeping on a
mutex because we may be in softint context where ktealloc is
forbidden.

But after mi_switch, on _exit_ from sleepq_block, l->l_syncobj may
have been changed back to &sched_syncobj or something by
sleepq_remove, and so ktrcsw can no longer rely on l->l_syncobj to
determine whether we _were_ sleeping on a mutex or not.

Instead, save the syncobj in sleepq_block and pass it through as an
argument to ktrcsw.

Reported-by: syzbot+414edba9d161b7502658@syzkaller.appspotmail.com
Reported-by: syzbot+4425c97ac717b12495a2@syzkaller.appspotmail.com
Reported-by: syzbot+5812565b926ee8eb5cf3@syzkaller.appspotmail.com
Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
Reported-by: syzbot+909a8e743c967d97f433@syzkaller.appspotmail.com
Reported-by: syzbot+e2a34bb5509bea0bba11@syzkaller.appspotmail.com
Reported-by: syzbot+faaea3aad6c9d0829f76@syzkaller.appspotmail.com


# 1.71 08-Apr-2022 andvar

fix various typos, mainly in comments, but also log messages, docs, game text.


# 1.70 01-Jan-2022 msaitoh

s/happends/happens/ in comment.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.69 23-Oct-2020 thorpej

- sleepq_block(): Add a new LWP flag, LW_CATCHINTR, that is used to track
the intent to catch signals while sleeping. Initialize this flag based
on the catch_p argument to sleepq_block(), and rather than test catch_p
when awakened, test LW_CATCHINTR. This allows the intent to change
(based on whatever criteria the owner of the sleepq wishes) while the
LWP is asleep. This is separate from LW_SINTR in order to leave all
other logic around LW_SINTR unaffected.
- In sleepq_transfer(), adjust also LW_CATCHINTR based on the catch_p
argument. Also allow the new LWP lock argument to be NULL, which
will cause the lwp_setlock() call to be skipped; this allows transfer
to another sleepq that is known to be protected by the same lock.
- Add a new function, sleepq_uncatch(), that will transition an LWP
from "interruptible sleep" to "uninterruptible sleep" on its current
sleepq.


# 1.68 21-May-2020 thorpej

In sleepq_insert(), in the SOBJ_SLEEPQ_SORTED case, if there are existing
waiters of lower priority, then the new LWP will be inserted in FIFO order
with respect to other LWPs of the same priority. However, if all other
LWPs are of equal priority to the LWP being inserted, the new LWP would
be inserted in LIFO order.

Fix this to always insert in FIFO order with respect to equal priority LWPs.

OK ad@.


# 1.67 08-May-2020 thorpej

Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.83 08-Oct-2023 ad

Oops, fix inverted test.


# 1.82 08-Oct-2023 ad

Ensure that an LWP that has taken a legitimate wakeup never produces an
error code from sleepq_block(). Then, it's possible to make cv_signal()
work as expected and only ever wake a singular LWP.


# 1.81 08-Oct-2023 ad

sleepq_block(): slightly reduce number of test+branch in the common case.


# 1.80 07-Oct-2023 ad

sleepq_uncatch(): fix typo that's been there since 2020, hello @thorpej lol:

- l->l_flag = ~(LW_SINTR | LW_CATCHINTR | LW_STIMO);
+ l->l_flag &= ~(LW_SINTR | LW_CATCHINTR | LW_STIMO);


# 1.79 07-Oct-2023 ad

sleepq_uncatch(): clear LW_STIMO too, so that there's no possibility that
the newly non-interruptable sleep could produce EWOULDBLOCK (paranoia).


# 1.78 05-Oct-2023 ad

Resolve !MULTIPROCESSOR build problem with the nasty kernel lock macros.


# 1.77 04-Oct-2023 ad

Eliminate l->l_biglocks. Originally I think it had a use but these days a
local variable will do.


# 1.76 23-Sep-2023 ad

Sigh.. Adjust previous to work as intended. The boosted LWP priority
didn't persist as far as the run queue because l_syncobj gets reset
earlier than I recalled.


# 1.75 23-Sep-2023 ad

- Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).


# 1.74 09-Apr-2023 riastradh

kern: KASSERT(A && B) -> KASSERT(A); KASSERT(B)


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.73 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.72 29-Jun-2022 riastradh

ktrace(9): Fix mutex detection in ktrcsw.

On _entry_ to sleepq_block, l->l_syncobj is set so that ktrcsw
(ktr_csw) has the opportunity to detect whether it's a mutex or
rwlock. It is critical to avoid ktealloc when we're sleeping on a
mutex because we may be in softint context where ktealloc is
forbidden.

But after mi_switch, on _exit_ from sleepq_block, l->l_syncobj may
have been changed back to &sched_syncobj or something by
sleepq_remove, and so ktrcsw can no longer rely on l->l_syncobj to
determine whether we _were_ sleeping on a mutex or not.

Instead, save the syncobj in sleepq_block and pass it through as an
argument to ktrcsw.

Reported-by: syzbot+414edba9d161b7502658@syzkaller.appspotmail.com
Reported-by: syzbot+4425c97ac717b12495a2@syzkaller.appspotmail.com
Reported-by: syzbot+5812565b926ee8eb5cf3@syzkaller.appspotmail.com
Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
Reported-by: syzbot+909a8e743c967d97f433@syzkaller.appspotmail.com
Reported-by: syzbot+e2a34bb5509bea0bba11@syzkaller.appspotmail.com
Reported-by: syzbot+faaea3aad6c9d0829f76@syzkaller.appspotmail.com


# 1.71 08-Apr-2022 andvar

fix various typos, mainly in comments, but also log messages, docs, game text.


# 1.70 01-Jan-2022 msaitoh

s/happends/happens/ in comment.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.69 23-Oct-2020 thorpej

- sleepq_block(): Add a new LWP flag, LW_CATCHINTR, that is used to track
the intent to catch signals while sleeping. Initialize this flag based
on the catch_p argument to sleepq_block(), and rather than test catch_p
when awakened, test LW_CATCHINTR. This allows the intent to change
(based on whatever criteria the owner of the sleepq wishes) while the
LWP is asleep. This is separate from LW_SINTR in order to leave all
other logic around LW_SINTR unaffected.
- In sleepq_transfer(), adjust also LW_CATCHINTR based on the catch_p
argument. Also allow the new LWP lock argument to be NULL, which
will cause the lwp_setlock() call to be skipped; this allows transfer
to another sleepq that is known to be protected by the same lock.
- Add a new function, sleepq_uncatch(), that will transition an LWP
from "interruptible sleep" to "uninterruptible sleep" on its current
sleepq.


# 1.68 21-May-2020 thorpej

In sleepq_insert(), in the SOBJ_SLEEPQ_SORTED case, if there are existing
waiters of lower priority, then the new LWP will be inserted in FIFO order
with respect to other LWPs of the same priority. However, if all other
LWPs are of equal priority to the LWP being inserted, the new LWP would
be inserted in LIFO order.

Fix this to always insert in FIFO order with respect to equal priority LWPs.

OK ad@.


# 1.67 08-May-2020 thorpej

Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.83 08-Oct-2023 ad

Oops, fix inverted test.


# 1.82 08-Oct-2023 ad

Ensure that an LWP that has taken a legitimate wakeup never produces an
error code from sleepq_block(). Then, it's possible to make cv_signal()
work as expected and only ever wake a singular LWP.


# 1.81 08-Oct-2023 ad

sleepq_block(): slightly reduce number of test+branch in the common case.


# 1.80 07-Oct-2023 ad

sleepq_uncatch(): fix typo that's been there since 2020, hello @thorpej lol:

- l->l_flag = ~(LW_SINTR | LW_CATCHINTR | LW_STIMO);
+ l->l_flag &= ~(LW_SINTR | LW_CATCHINTR | LW_STIMO);


# 1.79 07-Oct-2023 ad

sleepq_uncatch(): clear LW_STIMO too, so that there's no possibility that
the newly non-interruptable sleep could produce EWOULDBLOCK (paranoia).


# 1.78 05-Oct-2023 ad

Resolve !MULTIPROCESSOR build problem with the nasty kernel lock macros.


# 1.77 04-Oct-2023 ad

Eliminate l->l_biglocks. Originally I think it had a use but these days a
local variable will do.


# 1.76 23-Sep-2023 ad

Sigh.. Adjust previous to work as intended. The boosted LWP priority
didn't persist as far as the run queue because l_syncobj gets reset
earlier than I recalled.


# 1.75 23-Sep-2023 ad

- Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).


# 1.74 09-Apr-2023 riastradh

kern: KASSERT(A && B) -> KASSERT(A); KASSERT(B)


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.73 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.72 29-Jun-2022 riastradh

ktrace(9): Fix mutex detection in ktrcsw.

On _entry_ to sleepq_block, l->l_syncobj is set so that ktrcsw
(ktr_csw) has the opportunity to detect whether it's a mutex or
rwlock. It is critical to avoid ktealloc when we're sleeping on a
mutex because we may be in softint context where ktealloc is
forbidden.

But after mi_switch, on _exit_ from sleepq_block, l->l_syncobj may
have been changed back to &sched_syncobj or something by
sleepq_remove, and so ktrcsw can no longer rely on l->l_syncobj to
determine whether we _were_ sleeping on a mutex or not.

Instead, save the syncobj in sleepq_block and pass it through as an
argument to ktrcsw.

Reported-by: syzbot+414edba9d161b7502658@syzkaller.appspotmail.com
Reported-by: syzbot+4425c97ac717b12495a2@syzkaller.appspotmail.com
Reported-by: syzbot+5812565b926ee8eb5cf3@syzkaller.appspotmail.com
Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
Reported-by: syzbot+909a8e743c967d97f433@syzkaller.appspotmail.com
Reported-by: syzbot+e2a34bb5509bea0bba11@syzkaller.appspotmail.com
Reported-by: syzbot+faaea3aad6c9d0829f76@syzkaller.appspotmail.com


# 1.71 08-Apr-2022 andvar

fix various typos, mainly in comments, but also log messages, docs, game text.


# 1.70 01-Jan-2022 msaitoh

s/happends/happens/ in comment.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.69 23-Oct-2020 thorpej

- sleepq_block(): Add a new LWP flag, LW_CATCHINTR, that is used to track
the intent to catch signals while sleeping. Initialize this flag based
on the catch_p argument to sleepq_block(), and rather than test catch_p
when awakened, test LW_CATCHINTR. This allows the intent to change
(based on whatever criteria the owner of the sleepq wishes) while the
LWP is asleep. This is separate from LW_SINTR in order to leave all
other logic around LW_SINTR unaffected.
- In sleepq_transfer(), adjust also LW_CATCHINTR based on the catch_p
argument. Also allow the new LWP lock argument to be NULL, which
will cause the lwp_setlock() call to be skipped; this allows transfer
to another sleepq that is known to be protected by the same lock.
- Add a new function, sleepq_uncatch(), that will transition an LWP
from "interruptible sleep" to "uninterruptible sleep" on its current
sleepq.


# 1.68 21-May-2020 thorpej

In sleepq_insert(), in the SOBJ_SLEEPQ_SORTED case, if there are existing
waiters of lower priority, then the new LWP will be inserted in FIFO order
with respect to other LWPs of the same priority. However, if all other
LWPs are of equal priority to the LWP being inserted, the new LWP would
be inserted in LIFO order.

Fix this to always insert in FIFO order with respect to equal priority LWPs.

OK ad@.


# 1.67 08-May-2020 thorpej

Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.78 05-Oct-2023 ad

Resolve !MULTIPROCESSOR build problem with the nasty kernel lock macros.


# 1.77 04-Oct-2023 ad

Eliminate l->l_biglocks. Originally I think it had a use but these days a
local variable will do.


# 1.76 23-Sep-2023 ad

Sigh.. Adjust previous to work as intended. The boosted LWP priority
didn't persist as far as the run queue because l_syncobj gets reset
earlier than I recalled.


# 1.75 23-Sep-2023 ad

- Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).


# 1.74 09-Apr-2023 riastradh

kern: KASSERT(A && B) -> KASSERT(A); KASSERT(B)


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.73 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.72 29-Jun-2022 riastradh

ktrace(9): Fix mutex detection in ktrcsw.

On _entry_ to sleepq_block, l->l_syncobj is set so that ktrcsw
(ktr_csw) has the opportunity to detect whether it's a mutex or
rwlock. It is critical to avoid ktealloc when we're sleeping on a
mutex because we may be in softint context where ktealloc is
forbidden.

But after mi_switch, on _exit_ from sleepq_block, l->l_syncobj may
have been changed back to &sched_syncobj or something by
sleepq_remove, and so ktrcsw can no longer rely on l->l_syncobj to
determine whether we _were_ sleeping on a mutex or not.

Instead, save the syncobj in sleepq_block and pass it through as an
argument to ktrcsw.

Reported-by: syzbot+414edba9d161b7502658@syzkaller.appspotmail.com
Reported-by: syzbot+4425c97ac717b12495a2@syzkaller.appspotmail.com
Reported-by: syzbot+5812565b926ee8eb5cf3@syzkaller.appspotmail.com
Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
Reported-by: syzbot+909a8e743c967d97f433@syzkaller.appspotmail.com
Reported-by: syzbot+e2a34bb5509bea0bba11@syzkaller.appspotmail.com
Reported-by: syzbot+faaea3aad6c9d0829f76@syzkaller.appspotmail.com


# 1.71 08-Apr-2022 andvar

fix various typos, mainly in comments, but also log messages, docs, game text.


# 1.70 01-Jan-2022 msaitoh

s/happends/happens/ in comment.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.69 23-Oct-2020 thorpej

- sleepq_block(): Add a new LWP flag, LW_CATCHINTR, that is used to track
the intent to catch signals while sleeping. Initialize this flag based
on the catch_p argument to sleepq_block(), and rather than test catch_p
when awakened, test LW_CATCHINTR. This allows the intent to change
(based on whatever criteria the owner of the sleepq wishes) while the
LWP is asleep. This is separate from LW_SINTR in order to leave all
other logic around LW_SINTR unaffected.
- In sleepq_transfer(), adjust also LW_CATCHINTR based on the catch_p
argument. Also allow the new LWP lock argument to be NULL, which
will cause the lwp_setlock() call to be skipped; this allows transfer
to another sleepq that is known to be protected by the same lock.
- Add a new function, sleepq_uncatch(), that will transition an LWP
from "interruptible sleep" to "uninterruptible sleep" on its current
sleepq.


# 1.68 21-May-2020 thorpej

In sleepq_insert(), in the SOBJ_SLEEPQ_SORTED case, if there are existing
waiters of lower priority, then the new LWP will be inserted in FIFO order
with respect to other LWPs of the same priority. However, if all other
LWPs are of equal priority to the LWP being inserted, the new LWP would
be inserted in LIFO order.

Fix this to always insert in FIFO order with respect to equal priority LWPs.

OK ad@.


# 1.67 08-May-2020 thorpej

Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.76 23-Sep-2023 ad

Sigh.. Adjust previous to work as intended. The boosted LWP priority
didn't persist as far as the run queue because l_syncobj gets reset
earlier than I recalled.


# 1.75 23-Sep-2023 ad

- Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).


# 1.74 09-Apr-2023 riastradh

kern: KASSERT(A && B) -> KASSERT(A); KASSERT(B)


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.73 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.72 29-Jun-2022 riastradh

ktrace(9): Fix mutex detection in ktrcsw.

On _entry_ to sleepq_block, l->l_syncobj is set so that ktrcsw
(ktr_csw) has the opportunity to detect whether it's a mutex or
rwlock. It is critical to avoid ktealloc when we're sleeping on a
mutex because we may be in softint context where ktealloc is
forbidden.

But after mi_switch, on _exit_ from sleepq_block, l->l_syncobj may
have been changed back to &sched_syncobj or something by
sleepq_remove, and so ktrcsw can no longer rely on l->l_syncobj to
determine whether we _were_ sleeping on a mutex or not.

Instead, save the syncobj in sleepq_block and pass it through as an
argument to ktrcsw.

Reported-by: syzbot+414edba9d161b7502658@syzkaller.appspotmail.com
Reported-by: syzbot+4425c97ac717b12495a2@syzkaller.appspotmail.com
Reported-by: syzbot+5812565b926ee8eb5cf3@syzkaller.appspotmail.com
Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
Reported-by: syzbot+909a8e743c967d97f433@syzkaller.appspotmail.com
Reported-by: syzbot+e2a34bb5509bea0bba11@syzkaller.appspotmail.com
Reported-by: syzbot+faaea3aad6c9d0829f76@syzkaller.appspotmail.com


# 1.71 08-Apr-2022 andvar

fix various typos, mainly in comments, but also log messages, docs, game text.


# 1.70 01-Jan-2022 msaitoh

s/happends/happens/ in comment.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.69 23-Oct-2020 thorpej

- sleepq_block(): Add a new LWP flag, LW_CATCHINTR, that is used to track
the intent to catch signals while sleeping. Initialize this flag based
on the catch_p argument to sleepq_block(), and rather than test catch_p
when awakened, test LW_CATCHINTR. This allows the intent to change
(based on whatever criteria the owner of the sleepq wishes) while the
LWP is asleep. This is separate from LW_SINTR in order to leave all
other logic around LW_SINTR unaffected.
- In sleepq_transfer(), adjust also LW_CATCHINTR based on the catch_p
argument. Also allow the new LWP lock argument to be NULL, which
will cause the lwp_setlock() call to be skipped; this allows transfer
to another sleepq that is known to be protected by the same lock.
- Add a new function, sleepq_uncatch(), that will transition an LWP
from "interruptible sleep" to "uninterruptible sleep" on its current
sleepq.


# 1.68 21-May-2020 thorpej

In sleepq_insert(), in the SOBJ_SLEEPQ_SORTED case, if there are existing
waiters of lower priority, then the new LWP will be inserted in FIFO order
with respect to other LWPs of the same priority. However, if all other
LWPs are of equal priority to the LWP being inserted, the new LWP would
be inserted in LIFO order.

Fix this to always insert in FIFO order with respect to equal priority LWPs.

OK ad@.


# 1.67 08-May-2020 thorpej

Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.74 09-Apr-2023 riastradh

kern: KASSERT(A && B) -> KASSERT(A); KASSERT(B)


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.73 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.72 29-Jun-2022 riastradh

ktrace(9): Fix mutex detection in ktrcsw.

On _entry_ to sleepq_block, l->l_syncobj is set so that ktrcsw
(ktr_csw) has the opportunity to detect whether it's a mutex or
rwlock. It is critical to avoid ktealloc when we're sleeping on a
mutex because we may be in softint context where ktealloc is
forbidden.

But after mi_switch, on _exit_ from sleepq_block, l->l_syncobj may
have been changed back to &sched_syncobj or something by
sleepq_remove, and so ktrcsw can no longer rely on l->l_syncobj to
determine whether we _were_ sleeping on a mutex or not.

Instead, save the syncobj in sleepq_block and pass it through as an
argument to ktrcsw.

Reported-by: syzbot+414edba9d161b7502658@syzkaller.appspotmail.com
Reported-by: syzbot+4425c97ac717b12495a2@syzkaller.appspotmail.com
Reported-by: syzbot+5812565b926ee8eb5cf3@syzkaller.appspotmail.com
Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
Reported-by: syzbot+909a8e743c967d97f433@syzkaller.appspotmail.com
Reported-by: syzbot+e2a34bb5509bea0bba11@syzkaller.appspotmail.com
Reported-by: syzbot+faaea3aad6c9d0829f76@syzkaller.appspotmail.com


# 1.71 08-Apr-2022 andvar

fix various typos, mainly in comments, but also log messages, docs, game text.


# 1.70 01-Jan-2022 msaitoh

s/happends/happens/ in comment.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.69 23-Oct-2020 thorpej

- sleepq_block(): Add a new LWP flag, LW_CATCHINTR, that is used to track
the intent to catch signals while sleeping. Initialize this flag based
on the catch_p argument to sleepq_block(), and rather than test catch_p
when awakened, test LW_CATCHINTR. This allows the intent to change
(based on whatever criteria the owner of the sleepq wishes) while the
LWP is asleep. This is separate from LW_SINTR in order to leave all
other logic around LW_SINTR unaffected.
- In sleepq_transfer(), adjust also LW_CATCHINTR based on the catch_p
argument. Also allow the new LWP lock argument to be NULL, which
will cause the lwp_setlock() call to be skipped; this allows transfer
to another sleepq that is known to be protected by the same lock.
- Add a new function, sleepq_uncatch(), that will transition an LWP
from "interruptible sleep" to "uninterruptible sleep" on its current
sleepq.


# 1.68 21-May-2020 thorpej

In sleepq_insert(), in the SOBJ_SLEEPQ_SORTED case, if there are existing
waiters of lower priority, then the new LWP will be inserted in FIFO order
with respect to other LWPs of the same priority. However, if all other
LWPs are of equal priority to the LWP being inserted, the new LWP would
be inserted in LIFO order.

Fix this to always insert in FIFO order with respect to equal priority LWPs.

OK ad@.


# 1.67 08-May-2020 thorpej

Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.73 29-Jun-2022 riastradh

sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com


# 1.72 29-Jun-2022 riastradh

ktrace(9): Fix mutex detection in ktrcsw.

On _entry_ to sleepq_block, l->l_syncobj is set so that ktrcsw
(ktr_csw) has the opportunity to detect whether it's a mutex or
rwlock. It is critical to avoid ktealloc when we're sleeping on a
mutex because we may be in softint context where ktealloc is
forbidden.

But after mi_switch, on _exit_ from sleepq_block, l->l_syncobj may
have been changed back to &sched_syncobj or something by
sleepq_remove, and so ktrcsw can no longer rely on l->l_syncobj to
determine whether we _were_ sleeping on a mutex or not.

Instead, save the syncobj in sleepq_block and pass it through as an
argument to ktrcsw.

Reported-by: syzbot+414edba9d161b7502658@syzkaller.appspotmail.com
Reported-by: syzbot+4425c97ac717b12495a2@syzkaller.appspotmail.com
Reported-by: syzbot+5812565b926ee8eb5cf3@syzkaller.appspotmail.com
Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
Reported-by: syzbot+909a8e743c967d97f433@syzkaller.appspotmail.com
Reported-by: syzbot+e2a34bb5509bea0bba11@syzkaller.appspotmail.com
Reported-by: syzbot+faaea3aad6c9d0829f76@syzkaller.appspotmail.com


# 1.71 08-Apr-2022 andvar

fix various typos, mainly in comments, but also log messages, docs, game text.


# 1.70 01-Jan-2022 msaitoh

s/happends/happens/ in comment.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.69 23-Oct-2020 thorpej

- sleepq_block(): Add a new LWP flag, LW_CATCHINTR, that is used to track
the intent to catch signals while sleeping. Initialize this flag based
on the catch_p argument to sleepq_block(), and rather than test catch_p
when awakened, test LW_CATCHINTR. This allows the intent to change
(based on whatever criteria the owner of the sleepq wishes) while the
LWP is asleep. This is separate from LW_SINTR in order to leave all
other logic around LW_SINTR unaffected.
- In sleepq_transfer(), adjust also LW_CATCHINTR based on the catch_p
argument. Also allow the new LWP lock argument to be NULL, which
will cause the lwp_setlock() call to be skipped; this allows transfer
to another sleepq that is known to be protected by the same lock.
- Add a new function, sleepq_uncatch(), that will transition an LWP
from "interruptible sleep" to "uninterruptible sleep" on its current
sleepq.


# 1.68 21-May-2020 thorpej

In sleepq_insert(), in the SOBJ_SLEEPQ_SORTED case, if there are existing
waiters of lower priority, then the new LWP will be inserted in FIFO order
with respect to other LWPs of the same priority. However, if all other
LWPs are of equal priority to the LWP being inserted, the new LWP would
be inserted in LIFO order.

Fix this to always insert in FIFO order with respect to equal priority LWPs.

OK ad@.


# 1.67 08-May-2020 thorpej

Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.71 08-Apr-2022 andvar

fix various typos, mainly in comments, but also log messages, docs, game text.


# 1.70 01-Jan-2022 msaitoh

s/happends/happens/ in comment.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.69 23-Oct-2020 thorpej

- sleepq_block(): Add a new LWP flag, LW_CATCHINTR, that is used to track
the intent to catch signals while sleeping. Initialize this flag based
on the catch_p argument to sleepq_block(), and rather than test catch_p
when awakened, test LW_CATCHINTR. This allows the intent to change
(based on whatever criteria the owner of the sleepq wishes) while the
LWP is asleep. This is separate from LW_SINTR in order to leave all
other logic around LW_SINTR unaffected.
- In sleepq_transfer(), adjust also LW_CATCHINTR based on the catch_p
argument. Also allow the new LWP lock argument to be NULL, which
will cause the lwp_setlock() call to be skipped; this allows transfer
to another sleepq that is known to be protected by the same lock.
- Add a new function, sleepq_uncatch(), that will transition an LWP
from "interruptible sleep" to "uninterruptible sleep" on its current
sleepq.


# 1.68 21-May-2020 thorpej

In sleepq_insert(), in the SOBJ_SLEEPQ_SORTED case, if there are existing
waiters of lower priority, then the new LWP will be inserted in FIFO order
with respect to other LWPs of the same priority. However, if all other
LWPs are of equal priority to the LWP being inserted, the new LWP would
be inserted in LIFO order.

Fix this to always insert in FIFO order with respect to equal priority LWPs.

OK ad@.


# 1.67 08-May-2020 thorpej

Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.70 01-Jan-2022 msaitoh

s/happends/happens/ in comment.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.69 23-Oct-2020 thorpej

- sleepq_block(): Add a new LWP flag, LW_CATCHINTR, that is used to track
the intent to catch signals while sleeping. Initialize this flag based
on the catch_p argument to sleepq_block(), and rather than test catch_p
when awakened, test LW_CATCHINTR. This allows the intent to change
(based on whatever criteria the owner of the sleepq wishes) while the
LWP is asleep. This is separate from LW_SINTR in order to leave all
other logic around LW_SINTR unaffected.
- In sleepq_transfer(), adjust also LW_CATCHINTR based on the catch_p
argument. Also allow the new LWP lock argument to be NULL, which
will cause the lwp_setlock() call to be skipped; this allows transfer
to another sleepq that is known to be protected by the same lock.
- Add a new function, sleepq_uncatch(), that will transition an LWP
from "interruptible sleep" to "uninterruptible sleep" on its current
sleepq.


# 1.68 21-May-2020 thorpej

In sleepq_insert(), in the SOBJ_SLEEPQ_SORTED case, if there are existing
waiters of lower priority, then the new LWP will be inserted in FIFO order
with respect to other LWPs of the same priority. However, if all other
LWPs are of equal priority to the LWP being inserted, the new LWP would
be inserted in LIFO order.

Fix this to always insert in FIFO order with respect to equal priority LWPs.

OK ad@.


# 1.67 08-May-2020 thorpej

Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.69 23-Oct-2020 thorpej

- sleepq_block(): Add a new LWP flag, LW_CATCHINTR, that is used to track
the intent to catch signals while sleeping. Initialize this flag based
on the catch_p argument to sleepq_block(), and rather than test catch_p
when awakened, test LW_CATCHINTR. This allows the intent to change
(based on whatever criteria the owner of the sleepq wishes) while the
LWP is asleep. This is separate from LW_SINTR in order to leave all
other logic around LW_SINTR unaffected.
- In sleepq_transfer(), adjust also LW_CATCHINTR based on the catch_p
argument. Also allow the new LWP lock argument to be NULL, which
will cause the lwp_setlock() call to be skipped; this allows transfer
to another sleepq that is known to be protected by the same lock.
- Add a new function, sleepq_uncatch(), that will transition an LWP
from "interruptible sleep" to "uninterruptible sleep" on its current
sleepq.


# 1.68 21-May-2020 thorpej

In sleepq_insert(), in the SOBJ_SLEEPQ_SORTED case, if there are existing
waiters of lower priority, then the new LWP will be inserted in FIFO order
with respect to other LWPs of the same priority. However, if all other
LWPs are of equal priority to the LWP being inserted, the new LWP would
be inserted in LIFO order.

Fix this to always insert in FIFO order with respect to equal priority LWPs.

OK ad@.


# 1.67 08-May-2020 thorpej

Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-1-RELEASE netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.68 21-May-2020 thorpej

In sleepq_insert(), in the SOBJ_SLEEPQ_SORTED case, if there are existing
waiters of lower priority, then the new LWP will be inserted in FIFO order
with respect to other LWPs of the same priority. However, if all other
LWPs are of equal priority to the LWP being inserted, the new LWP would
be inserted in LIFO order.

Fix this to always insert in FIFO order with respect to equal priority LWPs.

OK ad@.


# 1.67 08-May-2020 thorpej

Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.67 08-May-2020 thorpej

Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


Revision tags: bouyer-xenpvh-base1
# 1.66 19-Apr-2020 ad

Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

branches: 1.63.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.65 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.64 10-Apr-2020 ad

- Make this needed sequence always work for condvars, by not touching the CV
again after wakeup. Previously it could panic because cv_signal() could
be called by cv_wait_sig() + others:

cv_broadcast(cv);
cv_destroy(cv);

- In support of the above, if an LWP doing a timed wait is awoken by
cv_broadcast() or cv_signal(), don't return an error if the timer
fires after the fact, i.e. either succeed or fail, not both.

- Remove LOCKDEBUG code for CVs which never worked properly and is of
questionable use.


Revision tags: bouyer-xenpvh-base phil-wifi-20200406
# 1.63 26-Mar-2020 ad

Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: is-mlppp-base ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-8-2-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

branches: 1.51.18;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.63 26-Mar-2020 ad

Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.62 24-Mar-2020 ad

Update a comment.


Revision tags: ad-namecache-base3
# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.61 15-Feb-2020 ad

- Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.60 01-Feb-2020 christos

fix incorrect type


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-0-RC2 netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.59 26-Jan-2020 ad

Add SOBJ_SLEEPQ_NULL: means there is no TAILQ and the caller tracks the
sleeping LWPs some other way, which sleepq_*() doesn't know about.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

branches: 1.56.2;
Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.58 12-Jan-2020 ad

Nothing uses l->l_sleeperr any more.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.57 08-Jan-2020 ad

Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.


Revision tags: ad-namecache-base
# 1.56 17-Dec-2019 ad

Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.56 17-Dec-2019 ad

Fix LOCKDEBUG panic on mutex_init().

Reported-by: syzbot+5a77339dc0a55e8d8caa@syzkaller.appspotmail.com


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.55 16-Dec-2019 ad

As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.54 06-Dec-2019 ad

Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.53 23-Nov-2019 ad

Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


# 1.52 21-Nov-2019 ad

Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.


Revision tags: phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 netbsd-8-1-RELEASE netbsd-8-1-RC1 isaki-audio2-base pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 netbsd-8-0-RELEASE phil-wifi-base pgoyette-compat-0625 netbsd-8-0-RC2 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 netbsd-8-0-RC1 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.51 03-Jul-2016 christos

GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.50 05-Sep-2014 matt

branches: 1.50.2;
Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base yamt-pagecache-base9 tls-earlyentropy-base rmind-smpnet-nbase rmind-smpnet-base tls-maxphys-base
# 1.49 24-Apr-2014 pooka

Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.48 08-Mar-2013 apb

branches: 1.48.6; 1.48.10;
Add comments saying that a cv_timedwait and sleepq_block interpret
timo = 0 as an infinite timeout. This is already documented in the
cv_timedwait(9) man page, and there is no sleeq_block(9) man page.


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.47 27-Jul-2012 matt

branches: 1.47.2;
Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.46 19-Feb-2012 rmind

Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base2 netbsd-6-base
# 1.45 28-Jan-2012 rmind

Remove obsolete ltsleep(9) and wakeup_one(9).


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.44 31-Oct-2011 yamt

branches: 1.44.2; 1.44.6;
- make lendpri/changepri similar.
- make common code a subroutine.


# 1.43 03-Sep-2011 christos

We need to process SA_STOP signals immediately, and not deliver them to
the process. Instead of re-structuring the code to do that, call issignal()
like before in that case. (tail -F /file^Zfg should not get interrupted).


# 1.42 31-Aug-2011 christos

PR/40594: Antti Kantee: Don't call issignal() here to determine what errno
to set for the interrupted syscall, because issignal() will consume the signal
and it will not be delivered to the process afterwards. Instead call
sigispending() (which now returns the first pending signal) and does not
consume the signal.


# 1.41 27-Jul-2011 uebayasi

These don't need uvm/uvm_extern.h.


# 1.40 26-Jul-2011 yamt

sleepq_insert: call lwp_eprio only when necessary


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.39 13-May-2011 rmind

Sprinkle __cacheline_aligned and __read_mostly, make some functions static.


# 1.38 27-Apr-2011 plunky

drop inline here, to avoid C99 vs GNU differences


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.37 21-Oct-2009 rmind

branches: 1.37.4; 1.37.6;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.36 21-Mar-2009 ad

Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.


Revision tags: netbsd-5-1-5-RELEASE netbsd-5-1-4-RELEASE netbsd-5-1-3-RELEASE netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.35 15-Oct-2008 wrstuden

branches: 1.35.2; 1.35.4; 1.35.8;
Merge wrstuden-revivesa into HEAD.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.34 11-Aug-2008 yamt

sleepq_block: fix a bug to lose biglocks in the case of recursive calls.

this fixes pf rb-tree corruption on my box.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.33 17-Jun-2008 ad

branches: 1.33.2;
sleepq_block: add a comment.


Revision tags: yamt-pf42-base4
# 1.32 16-Jun-2008 ad

PR kern/38761: new (?) race in buffer cache code

sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.


Revision tags: yamt-pf42-base3
# 1.31 31-May-2008 ad

branches: 1.31.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.


# 1.30 26-May-2008 ad

Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.


Revision tags: hpcarm-cleanup-nbase
# 1.29 19-May-2008 rmind

- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).


Revision tags: yamt-pf42-base2 yamt-nfs-mp-base2
# 1.28 28-Apr-2008 martin

branches: 1.28.2;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.27 24-Apr-2008 ad

branches: 1.27.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.


# 1.26 22-Apr-2008 ad

Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.25 12-Apr-2008 ad

branches: 1.25.2;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.


# 1.24 05-Apr-2008 yamt

assertions.


# 1.23 28-Mar-2008 ad

sleepq_block: use callout_halt, as we have to wait for the callout to
stop (it might be running on another CPU). Otherwise, 'curlwp' could
exit before it completes.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.22 17-Mar-2008 ad

Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.


Revision tags: nick-net80211-sync-base mjf-devfs-base hpcarm-cleanup-base
# 1.21 14-Feb-2008 ad

branches: 1.21.2; 1.21.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.20 04-Jan-2008 ad

Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.19 05-Dec-2007 ad

branches: 1.19.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.18 06-Nov-2007 ad

branches: 1.18.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.


Revision tags: yamt-x86pmap-base4
# 1.17 14-Oct-2007 yamt

branches: 1.17.2; 1.17.4;
sleepq_remove: remove a stale comment.


Revision tags: yamt-x86pmap-base3 vmlocking-base
# 1.16 13-Oct-2007 rmind

sleepq_remove: Do not call sched_wakeup() when thread is running.
This fixes a locking problem, when l_cpu is changed in LSONPROC state.
Possible case was noted by <ad>.


# 1.15 09-Oct-2007 rmind

Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!


Revision tags: nick-csl-alignment-base5 yamt-x86pmap-base2 yamt-x86pmap-base
# 1.14 06-Sep-2007 ad

branches: 1.14.2;
- Fix sleepq_block() to return EINTR if the LWP is cancelled. Pointed out
by yamt@.

- Introduce SOBJ_SLEEPQ_LIFO, and use for LWPs sleeping via _lwp_park.
libpthread enqueues most waiters in LIFO order to try and wake LWPs that
ran recently, since their working set is more likely to be in cache.
Matching the order of insertion reduces the time spent searching queues
in the kernel.

- Do not boost the priority of LWPs sleeping in _lwp_park, just let them
sleep at their user priority level. LWPs waiting for some I/O event in
the kernel still wait with kernel priority and get woken more quickly.
This needs more evaluation and is to be revisited, but the effect on a
variety of benchmarks is positive.

- When waking LWPs, do not send an IPI to remote CPUs or arrange for the
current LWP to be preempted unless (a) the thread being awoken has kernel
priority and has higher priority than the currently running thread or (b)
the remote CPU is idle.


# 1.13 31-Aug-2007 yamt

pull the following change from vmlocking branch.

revision 1.7.2.10
date: 2007/08/27 12:51:13; author: yamt; state: Exp; lines: +6 -7
sleepq_block: don't call lwp_unsleep twice.
(fix an assertion failure in lwp_unsleep.)


# 1.12 15-Aug-2007 ad

branches: 1.12.2;
Changes to make ktrace LKM friendly and reduce ifdef KTRACE. Proposed
on tech-kern.


Revision tags: matt-mips64-base
# 1.11 01-Aug-2007 ad

branches: 1.11.2; 1.11.4;
sleepq_block: if a pending signal is detected but has already been taken
by the time the calling thread tries to take it, don't return EINTR.
Instead return zero leading to a spurious wakeup.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.10 09-Jul-2007 ad

branches: 1.10.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements


# 1.9 17-May-2007 yamt

merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.


Revision tags: yamt-idlelwp-base8 thorpej-atomic-base
# 1.8 29-Mar-2007 ad

- cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.


# 1.7 27-Feb-2007 yamt

branches: 1.7.2; 1.7.4; 1.7.6;
typedef pri_t and use it instead of int and u_char.


Revision tags: ad-audiomp-base
# 1.6 26-Feb-2007 yamt

implement priority inheritance.


# 1.5 17-Feb-2007 pavel

branches: 1.5.2;
Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.


# 1.4 15-Feb-2007 ad

branches: 1.4.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).


# 1.3 10-Feb-2007 yamt

remove function prototypes of sa_awaken.


Revision tags: post-newlock2-merge
# 1.2 09-Feb-2007 ad

Merge newlock2 to head.


Revision tags: newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base yamt-splraiseipl-base2
# 1.1 20-Oct-2006 ad

branches: 1.1.2;
file kern_sleepq.c was initially added on branch newlock2.